Computer  Science  Department 


TECHNICAL  REPORT 


Toward  a  Fully  Integrated  VLSI  CAD  System: 
From  Custom  to  Fully  Automatic 


Yongtao  You 


Technical  Report  522 

October  1990 


sli :?:  »"2i 


-S^i«a8    NEW  YORK  UNIVERSITY 


CM 
CM 

in 
I 

« 
El 


0) 

(0 
u 
cr 

c 

•H    E 

w 


O  M 
M  (0  3 
U  -P  14-1 

C   (0 

o 

(0  w 
o  o  > 

>H  Eh 


s 
o 
u 


CO 

Q 

< 


Department  of  Computer  Science 
Courant  Institute  of  Mathematical  Sciences 

251  MERCER  STREET,  NEW  YORK,  N.Y.  10012 


Toward  a  Fully  Integrated  VLSI  CAD  System: 
From  Custom  to  Fully  Automatic 


Yongtao  You 


Technical  Report  522 

October  1990 


Toward  a  Fully  Integrated  VLSI  CAD  System: 
From  Custom  to  Fully  Automatic 


Yongtao  You 

Computer  Science  Depjirtment 

Courant  Institute  of  Mathematical  Sciences 

New  York  University 

251  Mercer  Street 
New  York,  NY  10012 

October  1990 


A  dissertation  in  the  Department  of  Computer  Science  submitted  to  the  faculty 
of  the  Graduate  School  of  Arts  and  Science  in  partial  fulfillment  of  the  requirements 
for  the  degree  of  Doctor  of  Philosophy  at  New  York  University. 


Approved: 


Professor  Alan  R.  Siegel 


Copyright  ©  1990  by  Yongtao  You 
All  Rights  Reserved. 


Toward  a  Fully  Integrated  VLSI  CAD  System: 
From  Custom  to  Fully  Automatic 

Yongtao  You 


Abstract 

This  thesis  describes  an  integrated  CAD  environment,  which  is  intented  to  support 
almost  all  phases  of  the  VLSI  circuit  design  cycle,  from  high-level  circuit  description  down 
to  mzLsk  generation.  Several  VLSI  CAD  tools  have  been  integrated  together  under  the 
environment,  including  a  multi-level  simulator,  automatic  placement  tools,  a  schematic 
layout  editor,  and  a  UC  Berkeley-developed  geometry  layout  editor. 

The  multi-level  simulator  supports  top-down  design  by  allowing  circuits  whose  compo- 
nents are  described  at  different  levels  to  be  simulated  together.  The  levels  of  circuit  descrip- 
tion currently  supported  include  a  variant  of  C  programming  language  for  circuit  behavior 
descriptions,  the  schematic  layout  representation,  and  the  Magic  layout  from  which  masks 
for  wafer  fabrication  can  be  generated.  The  hardness  of  charge-sharing  modeling  problem 
is  resolved,  and  a  new  model  for  it  is  given. 

The  schematic  layout  editor  aUovs  designers  to  specify  interconnections  among  circuit 
components  in  a  very  efficient  manner.  It  separates  behavioral  descriptions  of  a  circuit 
from  its  geometric  layout.  Designers  can  have  a  graphical  view  of  their  design,  and  specify, 
within  this  graphical  organization,  the  behavioral  description  of  components  at  different 
levels  of  abstraction.  These  schematic  layouts  with  different  levels  of  representation  can  be 
simulated  using  the  multi-level  simulator. 


The  automatic  placement  tool  presently  performs  bottom-up  iterative  improvement, 
with  simulated  annealing  as  its  assistant  when  needed.  Interactive  graphics  interface  is 
provided  which  aUows  human  intervention  on  intermediate  as  weU  as  final  layout. 
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Chapter  1 


Introduction 


During  the  last  decade,  semiconductor  technology  has  undergone  rapid  evolution,  which 
makes  it  possible  to  place  hundreds  of  thousands  of  transistors  on  a  single  chip.  The 
complexity  of  Very  Large  Scale  Integrated  (VLSI)  circuits  has  outgrown  human  design  ca- 
pabilities, and  Computer- Aided  Design  (CAD)  tools  have  become  essential  for  designers. 
In  fact,  CAD  tools  for  VLSI  circuit  designs  are  also  undergoing  rapid  evolution.  Today's 
designers  have  at  their  disposal  an  entire  array  of  automated  design  tools:  automatic  place- 
ment and  routing,  fault  test  generation,  design  rule  checking,  timing  verification,  various 
levels  of  simulation,  silicon  compilers,  and  a  lot  more.  These  tools  greatly  increase  the  pro- 
ductivity by  providing  assistance  to  circuit  designers  through  the  entire  life  cycle  of  VLSI 
chip  design,  from  architectural  specification  to  mask  generation. 

One  aspect  in  CAD  tool  development  which  has  been  largely  ignored  until  recently  is 
the  integration  of  these  existing  CAD  tools.  Each  tool  has  its  own  input  format,  its  own 
user  interface,  its  own  output  format,  and  its  own  interpretation  of  input/output  data. 
This  has  resulted  in  the  situation  where  designers  have  to  concentrate  much  of  their  efforts 
on  interfacing  various  design  tools,  thereby  paying  less  attention  to  the  main  task,  circuit 
design.  Due  to  the  lack  of  integrations  of  design  tools,  many  unnecessary  burdens  have 
been  put  on  designers.  These  include  the  following. 


•  Managing  the  large  amounts  of  data  generated  during  the  design  process; 

•  Learning  and  remembering  the  user  interaction  methods  for  each  of  the  tools  used 
during  the  design  process; 

•  Generating  and  interpreting  tool  specific  data  formats; 

•  Integrating  a  top-down  design  methodology  with  different  tools  at  each  phase  of 
design. 

Several  integrated  circuit  design  systems  have  been  developed  in  the  last  few  years,  such 
as  ChipBuster  [MIB  86]  and  Cadlab  [MGS  89].  These  systems  provide  a  common  interface 
for  various  tools  integrated  under  the  system,  which  eliminates  the  need  to  learn  different 
interfaces  for  each  tool.  A  data  base  system  is  also  provided  which  makes  the  management 
of  large  amounts  of  data  generated  during  the  design  process  a  lot  easier.  Also,  one  single 
standard  data  format  is  used  by  all  the  tools  under  the  integrated  system  so  that  designers 
do  not  need  to  transfer  data  from  one  format  to  another. 

This  thesis  is  another  effort  made  towards  integration  of  CAD  tools.  Its  goal  is  to  allow 
the  designer  to  integrate  custom  and  automatic  design  styles  in  a  flexible  and  efficient 
manner.  In  particular,  the  system  should  support  the  top-down  design  methodology,  which 
has  been  proven  to  be  effective  in  designing  large,  complex  systems.  Designers  should  be 
able  to  design  circuits  at  very  high  level  of  abstraction  such  as  behavior  descriptions  in 
some  hardware  description  languages,  as  well  as  at  very  low  level  of  abstractions  such  as 
geometry  layout.  Also,  the  transformation  between  these  levels  should  be  easy  and  reliable. 

We  have  planned  to  integrate  as  many  CAD  tools  under  the  environment  as  necessary 
in  order  to  reach  this  goal.  The  system  was  designed  as  a  platform  capable  of  supporting 
high  performance  subsystems  built  as  needed.  The  multi-level  simulator  and  the  automatic 
placement  tool  were  selected  as  our  starting  point.   A  schematic  layout  editor  was  added 


later.  The  VLSI  painting  and  circuit  extraction  system  Magic  (designed  at  UC  Berkeley) 
and  various  tools  associated  with  it,  such  as  Crystal  and  SPICE,  was  also  integrated  un- 
der the  environment.  The  system  is  implemented  on  Sun  Workstation  under  Sun/Unix 
operating  system. 

In  the  rest  of  the  thesis,  we  are  going  to  explore  two  important  problems  of  VLSI 
CAD  tool  family,  namely  simulation  and  placement.  The  main  reason  for  choosing  these 
two  topics  is  that  they  are  two  of  the  most  critical  utilities  needed  by  designers,  and  are 
suitable  for  an  individual  effort  in  CAD  construction.  Some  attentions  will  also  be  given 
to  the  schematic  layout  editor. 

1.1      Simulation 

Various  levels  of  simulators  have  been  one  of  the  most  important  members  of  VLSI  CAD 
tool  family.  Although  the  use  of  these  simulators  can  not  guarantee  the  correctness  of 
a  design,  they  can,  by  using  a  carefully  selected  set  of  tests,  identify  many  of  the  most 
frequently  occurring  types  of  error  at  earliest  possible  stage.  Up  to  now,  many  simulators 
have  been  developed,  each  simulates  circuits  at  a  different  level  of  abstraction,  such  <ls 
system  level,  functional  level,  gate  level,  switch  level,  and  circuit  level.  Although  separate 
simulators  are  available  for  almost  every  stage  of  a  chip  design,  this  multiple  simulator 
approach  has  several  problems: 

•  The  design  effort  is  increased  due  to  the  need  to  learn  several  simulation  tools. 

•  As  a  result  of  top-down  design  methodology,  most  of  the  time  during  the  design  pro- 
cess, different  components  of  the  circuit  are  described  at  different  levels  of  abstraction. 
Since  each  simulator  operates  at  just  one  level  of  abstraction,  only  part  of  the  circuit 
can  be  simulated  at  one  time.  Interconnections  between  parts  that  are  described  at 


different  levels  can  not  be  simulated. 

•  The  effort  needed  to  manually  generate  test  data  for  a  portion  of  the  circuit  is  usuaUy 
excessive.  Interpreting  and  anaJyzing  test  results  of  a  portion  of  the  circuit  can  also 
be  time  consuming. 

•  Simulating  the  whole  circuit  repeatedly  at  a  low  level  each  time  a  small  change  is 
made  is  both  unnecessary  and  time  consuming. 

The  main  drawback  here  is  that  the  top-down  design  methodology  is  not  supported.  The 
whole  circuit  has  to  be  described  at  the  same  level  of  abstraction  in  order  to  be  simulated 
as  a  whole.  The  recognition  of  these  problems  leads  to  the  development  of  a  multi-level 
simulator,  which  can  simulate  circuit  with  different  components  described  at  vastly  different 
levels  of  abstraction.  It  allows  designers  to  start  with  a  very  high  level  specification  of  a 
circuit,  and  then  refine  the  design  one  small  piece  at  a  time.  After  any  refinement,  the 
whole  circuit  with  the  interconnected  modules^  can  be  simulated  even  though  components 
may  be  specified  at  any  level.  By  verifying  the  design  after  each  refinement,  design  errors 
can  be  found  and  corrected  as  early  as  possible. 

As  part  of  our  integrated  CAD  system,  a  multi-level  simulator  has  been  developed. 
It  supports  the  top-down  design  methodology  by  accepting  any  hierarchical  mix  of  mod- 
ules designed  in  Magic,  in  schematic  layout,  and  in  CHDL  (a  superset  of  C  programming 
language).  We  will  discuss  it  in  detail  in  the  next  Chapter. 

1.2      Placement 

Generally  speaking,  the  problem  of  placement  encountered  during  the  physical  design  of 
VLSI  chips  is  to  determine  locations  for  its  components.    One  closely  related  problem  is 


'We  use  module,  circuit  component,  &nd  ce// inteTcha.ngeablely  throughout  this  thesis. 


routing,  which  determines  how  these  components  are  to  be  interconnected.  Together  these 
two  processes  determine  the  final  layout  of  the  circuit. 

For  different  design  styles,  the  placement  problem  is  slightly  different.  Consider,  for 
example,  three  of  the  most  standard  styles  of  chip  design:  gate-array,  standard-cell,  and 
custom.  In  gate-array  design,  components  are  low  level  functional  blocks  of  rectangular 
shape  with  various  sizes.  The  circuit  is  partitioned,  by  grid,  into  small  rectangles  called  slots, 
and  the  placement  process  involves  assigning  components  into  slot  or  multiple  slots  so  as  to 
optimize  any  number  of  objective  functions,  such  as  minimizing  the  total  interconnection 
length  and  the  number  of  utilized  components.  That  is,  each  component  occupies  one  or 
more  slots.  In  standard- cell  design,  components  are  of  rectangle  shapes,  often  with  the 
same  height  but  different  width.  The  circuit  layout  is  partitioned  into  rows  with  about  the 
same  height  as  components,  and  the  placement  process  involves  arranging  the  components 
so  that  they  abut  in  rows,  and  a  number  of  objective  functions,  such  as  total  circuit  area, 
are  optimized.  Component  placement  in  custom  design  is  the  most  complex  task  among  the 
three  design  styles.  The  component  are  of  arbitrary  shape  and  size,  and  they  can  be  placed 
anywhere  within  the  layout.  To  cope  with  the  complexity,  most  automatic  placement  tools 
allow  components  to  have  only  a  small  set  of  regular  shapes  (e.g.  rectangles,  L-shapes, 
T-shapes,  etc.  ),  although  usually  no  dimension  restrictions  apply. 

Many  automatic  placement  tools  have  been  developed  to  assist  designers  in  physical 
design  of  circuits.  AU  of  them  use  heuristics,  which  means  that  their  performances  are  good 
on  some  circuits  and  poor  on  others.  This  is  dne  to  the  fact  that  even  the  simplest  placement 
problems  in  any  of  the  three  cases  are  of  NP-complete  for  very  reasonable  objective  functions 
such  as  minimizing  the  total  area  or  total  wire  length  [Don  80],  [SaB  80]. 

As  part  of  our  integrated  CAD  system,  we  have  developed  an  experimental  placement 
tool.   Designers  can  experiment  and  modify  layout  with  different  objective  functions  and 


search  procedures.    They  can  even  place  part  of  the  circuit  by  hand,  which  may  greatly 
improve  the  design  for  some  hard-to-place  circuits. 

We  will  present  in  more  detJiil  the  placement  problem  in  general  in  Chapter  4,  and 
the  experimental  automatic  placement  tool  implemented  in  our  integrated  CAD  system  in 
Chapter  5. 


Chapter  2 

Principles  of  Multi-Level 
Simulation 


To  cope  with  the  rapid  increcise  of  complexity  of  VLSI  design  task,  various  kinds  of  sim- 
ulators have  been  developed.  Although  these  simulators  have  been  very  helpftil  in  aiding 
designers  to  build  complex  VLSI  circuit,  the  needs  for  multi-level  simulator  have  become 
more  and  more  evident.  The  idea  behind  multi-level  simulation  is  to  build  a  simulator  that 
accepts  circuits  with  different  components  described  in  different  levels  of  abstraction.  This 
kind  of  description  is  a  natural  result  of  any  top-down  design  methodology. 

A  multi-level  simulator  supports  the  top-down  design  methodology  by  allowing  designers 
to  start  with  a  specification  of  what  he/she  wants  to  build  in  a  very  high  level  of  abstraction. 
The  whole  circuit  is  usually  decomposed  into  a  number  of  components,  and  the  functional 
behaviors  of  these  components  as  well  as  their  interconnections  are  then  described.  At  this 
stage,  the  dicuit  should  be  simulated  using  high  level  features  of  a  multi-level  simulator, 
since  this  is  the  best  time  to  correct  any  errors. 

As  the  design  process  continues,  components  are  decomposed  into  smaller  and  smaller 
components,  and  the  levels  of  abstraction  in  which  these  components  are  described  becomes 
lower  and  lower.    This  implies  a  hierarchicjil  tree  structure^  in  which  each  internal  node 


*The  structure  could  be  a  DAG  as  the  result  of  common  subexpression  eliminations. 


represents  a  supercomponent;  each  leaf  may  represent  a  simple  component  that  could  be 
implemented  directly  at  the  lowest  level,  or  it  might  represent  a  supercomponent  that  will 
be  decomposed  further.  Each  refinement  step  consists  of  decomposing  one  of  the  super- 
components,  usually  a  small  part  of  the  circuit,  into  smaller  components,  and  producing  a 
description  for  each  newly  created  components  at  some  level  of  abstraction  (which  is  usually 
at  a  lower  level  than  that  of  the  parent).  After  each  refinement  step,  it  is  a  good  idea  to 
simulate  the  whole  circuit  in  order  to  find  possible  errors  made  during  the  refinement  step. 
This  suggests  the  need  of  multi-level  simulator. 

2.1      Levels  of  Simulation 

In  order  for  a  circuit  to  be  simulated  by  computer,  it  should  first  be  represented  by  a  math- 
ematical model.  Various  models  are  used  for  this  purpose;  each  corresponds  to  a  different 
level  of  abstraction  of  the  circuit.  Of  course,  each  model  has  advantages  and  disadvan- 
tages. Some  models  allow  designers  to  specify  more  accurate  and  detailed  information  of 
the  circuit  than  other  models  do,  and  we  say  these  models  are  of  lower  level  relative  to 
the  other  models.  Usually,  simulators  operating  at  lower  levels  of  models  perform  more 
accurate  simulation  than  those  operating  at  higher  levels.  Every  part  of  the  circuit  should 
be  simulated  at  very  lowest  possible  level  sometime.  But  they  usually  consume  much  more 
time.  On  the  other  hand,  lower  level  descriptions  of  the  circuit  is  not  available  at  early 
stages  of  design,  so  high  level  simulators  are  necessary.  This  explains  why  simulators  at 
various  levels  exist  at  the  same  time. 

In  the  following  subsections,  we  are  going  to  discuss  five  models  that  are  used  most 
often,  and  their  advantages  and  disadvantages. 


2.1.1  Architectural-Level  Simulation 

Simulations  of  this  kind  are  employeed  at  the  system  design  stage  to  predict  performance  of 
the  system,  and  to  determine  architecturcd  parameters  of  the  system,  such  as  the  amount  of 
cache,  or  the  number  of  registers.  Behaviors  of  components  are  usually  specified  in  a  very 
incomplete  way.  Only  those  that  are  relevant  to  system  performance  are  specified,  and  only 
relevant  parts  are  simulated.  This  level  of  simulation  is  very  different  from  the  other  levels 
of  simulation  that  we  are  going  to  discuss  below,  both  in  concept  and  in  implementation. 

Architectural-level  simulations  are  usually  done  in  an  ad  hoc  way  because  of  the  lack  of 
good  mathematical  models  [OrR  83],  [SSS  87]. 

2.1.2  Functional-Level  Simulation 

At  this  stage  of  design,  systems  have  been  divided  into  smaller  components.  The  functional 
behaviors  of  these  components  and  the  interactions  between  them  are  specified.  Usually,  no 
structural  information  is  given  at  this  time.  For  the  purpose  of  functional  behavior  descrip- 
tion, as  well  as  others,  many  specification  languages,  often  called  Hardware  Description 
Languages  (HDL),  are  invented.  While  some  of  these  HDLs  were  created  from  scratch,  like 
VHDL  [LMS  86],  some  were  simply  modifications' of  existing  programming  languages,  like 
ADLIB  [HiC  87]  from  Pascal,  and  SIMMER  [LaK  85]  from  LISP.  For  simplicity,  we  chose 
a  superset  of  C  programming  language  as  our  HDL.  Simulations  at  this  level  are  used  to 
find  behavioral  specification  faults  as  soon  as  possible;  these  faults  are  not  worthy  of  the 
significant  amount  of  effort  required  to  instantiate  them  in  a  lower  level  implementation. 

Functional-level  simulators  are  usually  very  fast,  since  they  ignore  lots  of  details  of  the 
circuit,  such  as  structural  information  and  how  the  circuit  will  be  implemented  at  lower 
level.  They  usually  act  like  a  abstract  mapping  between  inputs  and  outputs.  Most  program- 
ming techniques  can  be  used  to  speed  up  the  simulation.  Furthermore,  specifications  at  this 


level  are  usually  treated  as  the  definitions  of  the  circuit  to  be  designed.  The  correctness  of 
designs  at  lower  levels  is  checked  with  the  specifications  of  the  functional  level. 

2.1.3  Gate-Level  Simulation 

At  this  level  of  abstraction,  the  primary  building  blocks  of  circuits  are  gates,  e.g.  AND  gate, 
OR  gate,  etc.  connected  by  memoryless  nets.  Usually,  only  a  limited  number  of  predefined 
building  blocks  are  avaulable  to  the  designer.  Circuits  are  described  in  some  kind  of  HDLs, 
and  the  specifications  are  usually  produced  directly  by  hand. 

Gate-level  simulations  and  functional-level  simulations  share  lots  of  common  aspects. 
They  are  very  similar  in  the  sense  that  both  gates  and  functional  blocks  act  as  mappings 
between  their  inputs  and  outputs,  although  gates  are  usually  simpler  and  functioned  blocks 
are  usually  more  complex;  they  both  have  memoryless  nets  as  their  interconnections;  and 
signed  flows  between  them  are  logically  unidirectional.  Perhaps  the  only  difference  is  the 
complexities  of  their  components.  So  it  is  much  easier  to  mix  these  two  levels  together  than 
with  other  levels  of  simulation.  Maybe  that  is  why  for  many  multi-level  simulators,  this  is 
the  lowest  level  of  description  they  could  handle. 

2.1.4  Switch-Level  Simulation 

While  the  gate-level  model  is  a  good  low  level  representation  for  circuits  built  in  TTL  or  ECL 
technology,  it  is  one  level  too  high  for  circuits  built  in  MOS  technology.  In  MOS  circuits, 
the  basic  building  blocks  are  various  types  of  tranastors  connected  by  wires  capable  of 
"remembering"  the  previous  state  for  a  short  period  of  time.  Signals  can  flow  bidirectionally. 
And  circuits  can  be  implemented  directly  from  transistors,  not  just  from  those  limited 
number  of  predefined  gates. 

Since  the  introduction  of  switch-level  model  by  Bryant  [Bry  84],  many  switch-level  sim- 


10 


ulators  have  been  developed  [Bry  87].  On  the  one  hand,  they  can  capture  phenomenon 
such  as  bidirectional  signal  flow,  charge  sharing,  etc.  ,  thus  providing  more  accurate  simu- 
lation than  gate-level  simulators.  On  the  other  hand,  by  using  a  discrate  model  instead  of 
a  linear  model,  and  by  efficient  implementation,  most  of  the  switch-level  simulators  achieve 
the  speed  approaching  that  of  the  gate-level  simulators. 

2.1.5      Circuit-Level  Simulation 

Perhaps  the  the  most  time  consuming  as  well  as  the  most  accurate  simulator  currently 
available  would  be  a  circuit-level  simulator,  such  as  SPICE  [Nag  75]  and  RELAX  [LeS  82]. 
It  goes  into  the  detailed  electrical  behavior  analysis  of  the  design  by  solving  a  set  of  non- 
linear equations.  "Estimates  show  that  circuit  simulation  of  a  single  multiply  instruction 
using  SPICE  on  a  450,000  transistor  CPU  chip  would  take  approximately  6  CPU  months 
to  complete  on  an  IBM  370/168  processor,  and  require  250Mb  of  memory."  [ONC  86]  Due 
to  the  enormous  amount  of  time  consumed,  large  circuits  are  seldomly  simulated  at  this 
level.  Instead,  only  a  very  small  portion  of  the  circuit  is  simulated  at  one  time.  Designers 
first  isolate  the  possible  faults  into  a  very  small  region,  with  the  assistance  of  other  tools 
such  as  a  timing  simulator  or  a  switch-level  simulator.  This  small  region  is  then  simulated 
at  the  circuit  level,  using  specially  derived  input  data. 

2.2     Multi-Level  Simulation 

The  problem  with  conventional  multi-simulators  approach  is  that  no  one  simulator  is  useful 
throughout  all  phases  of  design.  Usually,  severed  types  of  simulators  are  used  at  various 
stages  of  design.  Since  each  simulator  is  capable  of  simulating  at  one  level  of  abstraction, 
the  whole  design  has  to  be  described  at  that  same  level  in  order  to  be  simulated  by  the 
simulator.  If  only  part  of  the  design  is  described  at  this  level,  you  can  also  simulate  this 


11 


part,  but  only  this  part.  The  interactions  of  this  part  with  the  rest  of  the  system,  for 
instance,  can  not  be  simulated.  This  limits  the  use  of  top-down  design  methodology,  which 
has  been  proven  to  be  effective  in  designing  large  systems. 

Given  these  problems  and  limitations,  many  attempts  have  been  made  towards  the  di- 
rection of  multi-level  simulation,  such  as  Multi-Sim  [ChC  78],  SALOGS-IV  [CaS  78],  the 
mixed-mode  simulator  in  [ABK  80],  and  SABLE/HELIX  [HiC  87].  For  example,  the  S- 
ABLE/HELIX  provides  users  with  a  multi-level  hardware  behavioral  description  language 
ADLIB  based  on  Pascal,  and  a  structural  description  language  SDL.  While  the  ADLIB  is 
used  to  describe  behavioral  information  of  a  circuit,  the  SDL  is  used  to  describe  structural 
information.  Both  languages  are  multi-level  in  the  sense  that  ADLIB  provides  users  with 
different  data  structures  and  their  interpretations,  and  SDL  allows  users  to  describe  struc- 
ture of  a  circuit  as  a  hierarchy  of  components.  Its  advantages  as  a  multi-level  simulator  are 
due  mainly  to  the  multi-level  description  language  ADLIB.  For  example,  a  memory  address 
could  be  represented  as  an  integer  (at  a  higher  level),  or  it  could  be  represented  as  an  array 
of  real  numbers  denoting  the  instantaneous  voltages  on  a  set  of  address  lines  (at  a  lower 
level).  Circuit  components  described  at  different  levels  could  be  simulated  simultaneously. 

Although  all  of  these  systems  share  some  of  the  merits  of  a  multi-level  simulator,  each 
has  its  own  drawbacks. 

•  Some  of  the  multi-level  simulators  was  designed  only  for  a  special  class  of  circuits. 
For  instance,  Multi-Sim  was  designed  soldy  for  microprocessor  based  systems. 

•  Although  allowing  modules  to  be  described  in  multi-levels  of  abstraction,  some  sim- 
ulators, like  Multi-Sim  and  the  mixed-mode  simulator  described  in  [ABK  80],  have 
only  a  limited  number  of  predefined  modules  available  at  functional  level.  Users  can 
not  define  his/her  own  modules  at  the  functional  leveL 
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•  For  low  level  circuit  descriptions,  some  simulators,  as  of  Multi-Sim,  SALOGS-IV, 
ADLIB  simulator,  and  the  mixed-mode  simulator  in  [ABK  80],  only  allow  users  to 
design  his/her  circuit  in  some  kind  of  description  languages,  instead  of  extracting 
circuit  descriptions  from  layout.  Although  useful,  this  kind  of  description  cannot  be 
used  as  a  substitution  of  layout  extractions,  since  such  a  design  is  only  what  designers 
think  the  circuit  would  be,  not  what  will  be  fabricated.  Furthermore,  a  HDL  design 
would  require  every  transistor  and  connection  to  have  a  textual  specification  as  well 
tis  a  graphical  description,  and  the  consistency  between  the  two  designs  would  be 
based  upon  human  vigilance. 

•  For  some  multi-level  simulators,  gate-level  simulation  is  the  lowest  level  of  simulation 
that  they  can  perform,  while  VLSI  circuit  needs  switch-level  simulation. 
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Chapter  3 


MSIM:  The  Multi-Level  Simulator 


The  goal  of  our  multi-level  simulator  is  to  allow  different  part  of  a  circuit  to  be  described  in 
different  HDLs,  at  different  levels  of  abstraction,  and  still  be  able  to  be  simulated  together. 
Much  effort  has  been  taken  during  implementation  so  as  to  permit  new  HDLs  to  be  added 
to  the  multi-level  simulator  easily.  The  simulator  is  an  interactive,  event-driven  multi-level 
simulator  which  operates  in  unit  delay  mode.  So  far,  two  levels  of  the  circuit  description 
are  recognized  by  our  multi-level  simulator,  the  functional-level  and  the  switch-level. 

Some  of  the  features  of  our  multi-level  simulator  are: 

•  Acceptance  of  different  levels  of  VLSI  circuit  descriptions; 

•  Linear  time  race  detection; 

•  Linear  time  switch-level  simulation  algorithm; 

•  Unlimited  number  of  levels  of  nodeRze  and  transistor  size; 

•  Choice  of  simulating  at  high  levels  of  abstraction  for  speed,  or  low  levels  of  abstraction 
for  accuracy; 

•  Compatibility  with  Magic. 
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3.1      Switch-Level  Simulation 

Our  switch-level  simulation  is  based  on  the  switch-level  model  introduced  by  Bryant  [Bry  84]. 
It  accepts  as  input  the  net  list  extracted  from  geometrical  layout  produced  by  Magic  (an 
U.C.  Berkeley  designed  VLSI  painting  and  circuit  extraction  system).  Although  using  the 
same  mathematical  model  as  MOSSIM  does,  our  simulator  has  a  completely  different  imple- 
mentation. Instead  of  solving  linear  equations,  we  do  a  graph  traversal  using  the  best-first 
search  technique.  Also,  entensive  table  lookups  are  adopted  to  speed  up  the  simulation. 
Another  feature  implemented  is  a  linear  time  race  detection  algorithm. 

3.1.1      The  Network  Model 

In  this  section,  we  present  the  network  model  introduced  by  Bryant  (See  [Bry  84]  for  de- 
tails.) for  switch-level  simulators. 

In  Bryant's  switch-level  model,  a  MOS  circuit  is  modeled  as  a  set  of  nodes  {ni,n2, . . .  ,nm} 
connected  by  a  set  of  transistors  {<i,t2, . . .  ,tn}-  Each  node  n,  has  a  state  in  the  set 
T  =  {0,1,  AT},  and  is  classified  as  either  an  input  node  or  a  storage  node.  For  each  storage 
node,  there  is  a  size  in  the  set  {ki,K27-  •  •  >'^m}  associated  with  it.  The  size  of  a  node  indi- 
cates its  approximate  capacitance  relative  to  other  nodes  with  witch  it  may  share  charge, 
where  sizes  are  ordered  Ki  <  kj  <  '  •  •  <  '^m-  A  node  with  state  of  0  means  the  presents 
of  a  low  voltage  signal;  a  node  with  state  of  1  means  the  presents  of  a  high  voltage  signal; 
and  a  node  with  state  of  X  means  the  presents  of  a  unknow  voltage  signal.  No  attempt  is 
made  to  distinguish  between  "unknown  but  valid"  state  (a  state  in  which  node  has  a  valid 
voltage  but  it's  not  known  to  the  simulator)  and  'invalid"  state  (a  state  in  which  node  has 
a  invalid,  intermediate  voltage).  Input  nodes  are  sources  of  electrical  current,  like  \'dd  and 
GND.  They  have  size  uj  to  distinguish  them  from  storage  nodes. 

The  basic  building  block  in  MOS  circuit  is  the  transistor,  which  is  used  primarily  as  a 
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switching  element.  A  transistor  is  a  three  terminal  device  with  terminals  labeled  "gate", 
"source",  and  "drain".  The  source  and  drain  terminals  are  symmetric;  signals  could  be 
transmitted  bidirectionally  between  them,  controlled  by  the  voltage  at  the  gate  terminal. 
Each  transistor  t,  also  has  a  state  in  the  set  T  =  {0,1,  A'},  but  with  0  indicating  an  open 
(disconducting)  switch,  and  1  indicating  a  closed  (conducting)  switch.  A  transistor  in  the 
X  state  forms  an  indeterminate  conductance  between  (inclusively)  its  conductance  when 
open  and  that  when  closed.  Also,  each  transistor  has  a  strength  in  the  set  {71.72.- ••  ,7n} 
indicating  its  conductance  when  closed  relative  to  other  transistors  which  may  form  part 
of  a  ratio  path,  where  strengths  are  ordered  7i  <  72  <  ■  •  •  <  7n- 

There  are  three  types  of  transistors,  each  behaves  differently  in  response  to  the  voltage 
at  the  gate,  as  shown  in  Table  3.1. 

gate  state       n-type  p-type       d-type 


0 

open 

closed 

closed 

1 

closed 

open 

closed 

X 

unknown 

unknown 

closed 

Table  3.1:  Transistor  types 

For  instance,  for  an  n-type  transistor,  a  high  voltage  at  its  gate  causes  a  high  conduc- 
tance path  between  the  source  and  the  drain;  ajid  a  low  voltage  at  the  gate  isolates  the 
source  from  the  drain.  A  d-type  transistor  corresponds  to  a  depletion  mode  transistor, 
which  is  always  closed. 

3.1^      Algorithm 

The  algorithm  we  used  was  based  on  the  network  model  described  in  the  previous  section. 
Under  the  so  called  unit  delay  mode,  every  transistor  takes  the  same  amount  of  time  to 
switch.  And  signals  take  the  same  amount  of  time  to  go  through  any  wires,  no  matter  how 
long  or  how  short  it  is.    So  a  combinationaJ  network  n  levels  deep  takes  exactly  n  time 
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units  to  stabilize.  Within  each  time  unit,  all  the  transistors  whose  gates  have  changed  their 
states  will  switch  first,  and  then  all  the  signals  will  travel  through  "closed"  (conducting) 
transistors  as  far  as  they  can  go. 

Formally,  the  final  state  of  the  circuit  after  one  unit  step  is  determined  by  the  so  called 
steady-state  response  function.  If  no  X  state  is  present  in  the  circuit,  the  steady-state 
response  function  F{y,  z)  equals  the  vector  of  node  states  y'  that  would  result  if  the  nodes 
were  initialized  to  states  given  by  the  vector  y,  and  the  transistors  were  held  fixed  in 
states  given  by  the  vector  z,  where  y  6  {0,1}'"  and  z  G  {0,1}".  For  the  case  where 
2/,z  G  {0,1,  A'}"^,  we  want  to  define  the  steady-state  response  on  node  n,  as  0  (or  1)  iff 
it  would  have  this  same  steady-state  response  if  the  nodes  initially  in  the  X  state  were 
set  to  any  combination  of  O's  and  I's  and  the  transistors  in  the  X  state  were  set  to  any 
combination  of  O's  (open)  and  I's  (closed).  To  compute  the  steady-state  response,  we  need 
the  following  definitions  [Bry  84]. 

Definition  3.1  The  least  upper  bound  (lub)  of  a  set  of  ternary  values  equals  1  (or  0) 
iff  all  elements  of  the  set  equal  1  (or  0),  and  equals  X  otherwise.  The  lub  operation  acts 
as  a  "consistency"  operation  with  inconsistency  represented  by  X . 

Definition  3.2  The  ternary  switch  graph  S  of  a  circuit  contains  a  vertex  v,  for  each 
node  n,  in  the  circuit,  with  size  Size{vt)  equal  to  the  size  of  node  n,,  and  with  state  State{v,) 
equal  to  the  initial  state  of  node  n,.  S  contains  a  1-edge  for  each  transistor  in  the  circuit 
whose  initial  state  is  1;  and  an  X-edge  for  each  transistor  whose  initial  state  is  X.  Edge 
e,  (either  1-edge  or  X-edge)  connects  the  vertices  corresponding  to  the  source  and  drain  of 
transistor  ti  and  has  strength  Strength{ei)  equal  to  the  strength  of  this  transistor. 

The  effect  of  the  initial  voltage  of  one  node  on  the  steady-state  voltage  of  another  through 
a  series  of  conducting  (maybe  X)  transistors  is  described  in  terms  of  a  rooted  path. 
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Definition  3.3  A  rooted  path  p  in  a  ternary  switch  graph  is  a  quadruple  {  Root(p).  Dest(p), 
l-Edges(p),  X-Edges(p)  )  consisting  of  an  initial  vertex  Root(p),  a  final  vertex  Dest(p),  a 
set  of  1-edges  l-Edges(p),  and  a  set  of  X-edges  X-Edges(p),  such  that  the  elements  of  the 
set  Edges(p)  =  l-Edges(p)  U  X-Edges(p)  form  a  contiguous  simple  path  from  Root(p)  to 
Dest(p).   The  strength  of  p,  denoted  \p\,  is  equal  to 

\p\  =  min(Size{Root{p)),  Strength(ei)  \  e,  6  Edges{p)). 

Definition  3.4   A  rooted  path  p  is  called  a  definite  path  if  X-Edges(p)  =  0. 

Definition  3.5  A  rooted  path  p  in  a  ternary  switch  graph  is  said  to  be  blocked  iff  for  some 
initial  segment  p'  of  p  and  some  definite  rooted  path  q,  Dest(p')  =  Dest(q)  and  \p'\  <  \q\. 

It  is  easily  seen  that  if  there  is  an  unblocked  path  from  node  n^  to  node  n,,  that  means 
the  signal  from  node  n^  is  one  of  the  strongest  among  all  that  can  reach  n,,  hence  the 
current  state  of  n^  will  affect  the  final  state  of  n,.  Let's  define  jP(z)i  iff  there  exits  an 
unblocked  rooted  path  p  in  a  ternary  switch  graph  such  that  Root{p)  =  Vj  and  Dest{p)  =  r,. 
Then  the  steady-state  response  of  node  n,  for  initial  node  state  y  and  transistor  state  z  is 
given  by  y': 

y^=lub{y,  \jP{z)i).  (3.1) 

From  the  above  definitions  we  can  see  that  the  steady-state  response  function  can  be 
computed  by  traversing  the  switch,  gr^h.  three  times.  During  the  first  traversal,  we  compute 
aU  the  definite  paths.  During  the  second  traversal,  we  try  to  find  aJl  the  unblocked  paths 
starting  at  a  node  with  initial  state  of  1  or  A';  and  during  the  third  traversal,  find  all  the 
unblocked  paths  starting  at  a  node  with  initial  state  of  0  or  A'.  Then  use  Equation  3.1  to 
compute  the  steady-state  response  of  each  node.  If  no  unblocked  paths  from  a  node  with 
initial  state  of  1  of  X  to  node  n,  has  been  found,  then  the  steady-state  response  for  n,  will 
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be  0;  if  no  unblocked  path  from  a  node  with  initial  state  of  0  or  A'  to  n,  has  been  found, 
then  the  steady-state  response  for  n^  will  be  1;  otherwise  it  will  be  X. 

Algorithm  3.1   Switch-level  Simulation  Algorithm. 

INPUT:  A  list  of  nodes  and  a  list  of  transistors  representing  a  circuit.  Each  node  in  the 
list  has  an  initial  state,  which  may  be  the  result  of  previous  simulations,  or  X .  Each 
transistor  in  the  list  also  has  an  initial  state.  Also  provided  as  input  is  an  event  queue 
of  nodes  with  their  new  states. 

OUTPUT:  The  output  of  this  algorithm  is  reflected  by  the  side  effects  on  the  list  of  nodes 
given  as  input.  As  a  result,  some  nodes  will  be  set  to  a  new  state. 

method: 

1.  If  the  event  queue  is  empty,  or  the  time  limit  exceeds,  return. 

2.  For  each  node  on  the  event  queue,  find  those  transistors  of  which  it  is  the  gate 
and  change  their  states. 

3.  For  each  node  on  the  event  queue,  find  those  transistors  of  which  it  is  the  gate, 
and  put  the  transistor's  source  and  drain,  and  all  the  unqueued  nodes  reachable 
from  them  onto  a  multi-level  queue. 

4-  Push  signals  on  the  multi-level  queue  through  "on"  transistors,  strongest  first. 
Each  time  a.  »ignal  reaches  a  node,  remember  the  signal  with  its  strength  at  the 
node.  After  this  step,  all  definite  paths  are  found. 

5.  Push  signals  on  the  multi-level  queue  through  "on"  or  "X  "  transistors,  strongest 
first.  Each  time  a  signal  reaches  a  node,  remember  the  signal  with  its  strength 
at  the  node.  Note  that  the  definite  paths  found  in  step  4  ore  checked  to  see  if  any 
signals  being  pushed  are  blocked. 
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6.  Empty  the  event  queue. 

7.  For  each  node  on  the  list,  compute  its  new  state  using  Equation  3.1.  If  the  new 
state  is  different  from  the  old  one,  put  the  node  onto  the  event  queue. 

8.  Goto  step  1. 


3.1.3     Implementation 

The  main  part  of  Algorithm  3.1  is  to  find  the  strongest  paths  of  certain  kind  for  each  node. 
As  we  can  see  from  Algorithm  3.1,  at  least  two  passes  through  the  ternary  switch  graph  are 
needed  before  we  can  use  Equation  3.1  to  compute  the  steady-state  response  for  each  node. 
First  we  have  to  find  all  definite  paths,  by  traversing  the  ternary  switch  graph  once,  so  that 
we  know  which  paths  will  be  blocked  during  subsequent  passes.  Next  all  signals  are  pushed 
through  "on"  or  "X"  transistors,  as  long  as  they  are  not  blocked  by  any  definite  paths  found 
in  the  previous  pass.  Since  a  definite  path  can  block  a  signal  only  if  it  is  stronger  than 
the  signal,  by  pushing  signals  in  a  right  order,  we  can  traverse  the  ternary  switch  graph 
only  once.  We  may  encode  all  signals  into  one  record  and  push  them  together,  as  long  as 
the  definite  paths  are  in  place  before  we  push  other  signals  that  they  should  block.  Our 
way  of  doing  this  is  by  best-first  search,  a  variant  of  breath-first  search,  where  instead  of 
choosing  the  leftmost  child  to  expand,  we  choose  the  "best"  one  (the  one  with  the  strongest 
signal)  to  expand.  To  do  so,  we  keep  m  +  n  groups  of  sublists,  one  for  each  strength  in  the 
set  S  =  {7„,. . .  ,7i,K„,...Ki}.  The  group  in  which  a  node  belongs  is  determined  by  the 
strength  of  the  definite  signal^  on  the  node.  Within  each  group,  there  are  at  most  m  +  n 
sublists,  and  the  sublist  on  which  a  node  belongs  is  determined  by  the  strongest  strength  of 
the  unblocked,  non-definite  signal  on  the  node.  These  sublists  are  used  in  the  usual  way  as 


'There  is  a  definite  signal  on  a  node  n  iff  there  is  an  unblocked  definite  path  p  such  that  Dest(p)  =  n. 
The  strength  of  this  definite  signal  equals  \p\. 
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an  event  list  is  used  in  most  event  driven  simulators,  except  that  we  split  them  into  several 
sublists.  The  purpose  of  dividing  the  event  queue  into  sublists  is  to  simplify  the  selection 
of  the  "best"  during  the  best-first  search.  Signals  in  group  7„  are  propagated  first,  then 
those  in  group  7ti_i  are  propagated,  and  so  on.  Within  each  group,  signals  on  the  sublist 
corresponding  to  the  strongest  strength  are  propagated  first.  Here  we  are  giving  priorities 
to  definite  signals  since  they  are  capable  of  blocking  other  signals.  What  we  want  to  do  is 
to  push  the  strongest  signal  first,  so  some  of  the  weaker  signals  will  be  blocked  when  we  try 
to  push  them  further,  and  they  will  just  be  ignored.  This  way,  we  can  find  the  strongest 
path  for  aU  nodes  in  time  0{s  +  t),  where  s  is  the  size  of  the  set  5,  and  t  is  the  number  of 
transistors  in  the  circuit. 

3.1.4      Race  Condition  Detection 

A  node  in  a  circuit  is  said  to  have  the  potential  of  having  race  condition  if  its  final  state 
depends  on  the  relative  speeds  of  signals  reaching  it.  In  another  word,  a  race  condition 
occurs  at  a  node  if  it  settles  at  different  states  for  different  orders  of  switchings  of  the  gates 
in  the  circuit.  Fig.  3.1  gives  an  example  of  a  circuit  in  which  a  race  condition  occurs. 

In  Fig.  3.1,  after  pha.se  phi,  both  A  and  B  will  settle  to  1.  During  phase  phi,  the  A 
node  is  switching  from  1  to  0,  as  the  phi  is  switching  from  0  to  1.  Depending  on  how  fast 
the  two  transistors  s^  and  S2  switch  relatively  to  each  other,  the  node  B  could  end  up  in 
1  or  0.  U  the  traiisistor  a^  switches  faster  than  the  other,  then  B  will  remain  in  1  as  been 
set  during  phase  phi,  at  least  for  a  while.  On  the  other  hand,  if  the  transistor  si  switches 
faster,  then  there  will  be  a  path  from  B  to  the  ground  momentarily.  If  this  path  could  be 
kept  long  enough,  B  could  be  driven  to  0.  So  we  say  a  race  condition  occurs  at  node  B. 

A  switch  can  be  turned  on  in  our  switch-level  simulator  to  detect  race  conditions.  The 
technique  used  here  to  detect  race  conditions  is  called  ternary  simulation  [Bry  83].  It  works 
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Figure  3.1:  Race  condition  at  node  B 

by  inserting  another  phase,  called  transition  phase,  between  each  phases  at  which  input  data 
or  clocks  are  changed.  At  the  beginning  of  each  phase,  just  as  we  are  going  to  change  inputs 
or  clocks,  we  first  set  the  changing  inputs  to  X  and  the  network  is  simulated  just  as  usual 
until  a  stable  state  is  reached.  As  a  consequence,  all  nodes  which  could  possibly  change 
state  during  this  phase  are  set  to  X.  This  is  because  a  node  could  change  its  state  only  if 
some  of  the  switching  transistors  could  cause  different  values  to  reach  it.  And  by  setting 
those  transistors  to  X  still  allows  the  same  values  to  reach  the  node,  except  that  they  are 
not  coming  along  a  definite  path,  hence  setting  the  node  to  X  state.  Next,  all  the  changing 
inputs  are  set  to  their  final  values  and  the  network  is  again  simulated  until  a  stable  state 
is  reached.  As  a  result,  any  node  for  which  the  final  logic  level  depends  on  the  particular 
logic  or  wiring  delays  in  the  circuit  will  remain  set  to  JV,  indicating  a  race  condition  or  even 
a  possible  sequential  timing  error.  The  reason  is  that  a  node  which  got  an  A'  during  the 
transition  phjise  means  that  it  is  possible  for  it  to  change  its  state,  and  its  state  remains  in 
X  after  the  following  phase  means  that  there  exists  no  definite  path  that  brings  a  stronger 
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signal  to  override  the  X.  Furthermore,  a  kind  of  hazard  on  a  node  is  indicated  by  a  state 
sequence  of  the  form  0— 'X— »Oorl— ►A'-*!. 

Algorithm  3.2   Race  Detection  Algorithm. 

INPUT:  A  list  of  nodes  and  a  list  of  transistors  representing  a  circuit;  and  an  event  queue 
Q  of  nodes  with  their  new  states. 

OUTPUT:    The  output  of  this  algorithm  is  reflected  by  the  side  effects  on  the  list  of  nodes 
given  as  input.  Some  nodes  will  be  set  to  a  new  state. 

method: 

1.  Construct  a  new  event  queue  Q'  consisting  of  the  same  nodes  as  on  Q.  Set  the 
new  state  of  each  node  on  Q'  to  X. 

2.  Invoke  Algorithm  3.1  to  simulate  the  circuit,  using  the  new  event  queue  Q' . 

3.  Again,  invoke  Algorithm  3.1  to  simulate  the  circuit,  but  this  time  use  the  old 
event  queue  Q. 

4-  For  each  node  n,  on  the  node  list,  let  Si^  be  its  original  state;  let  5,^  be  the  state 
after  step  2;  and  let  Si^  be  the  state  after  step  3.  If  ai^  =  X  then  report  race 
condition  at  node  n,;  If  (si°,  5,\  Si^)  =  (0,X,0)  or  (si° ,  s,^,  Si'^)  =  (1,X,1)  then 
report  hazard  at  node  n,. 


In  our  example  in  Fig.  3.1,  at  the  beginning  of  phase  phi,  we  will  set  phi  to  A'  and 
simulate  the  circuit.  Obviously,  both  A  and  B  will  settle  in  X  after  this  simulation.  Then 
we  will  set  phi  to  its  finaJ  state,  1,  and  the  circuit  is  simulated  again.    First,  s\  will  be 
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turned  on  while  52  is  in  unknown  state.  This  will  not  change  the  state  of  B.  Then  ^2  will 
be  turned  off,  which  is  stUl  not  going  to  change  the  state  of  5.  So  B  will  remain  in  X  state 
after  phase  phi,  indicating  a  race  condition. 

3.1.5      The  Simulation  of  Charge  Sharing 

The  situation  of  charge  sharing  occurs  when  a  node  in  a  circuit  is  connected  electronically 
to  nodes  storing  opposite  values.  Fig.  3.2  gives  one  such  example. 


a=  1      I 1  I 1        6  =  0 


out  =  ? 
Figure  3.2:  Charge  sharing 

In  reality,  the  node  out  might  get  state  of  1  if  node  a  has  more  charges  than  b  does;  it 
might  get  state  of  0  if  node  6  has  more  charges  than  a  does;  or  it  might  go  to  an  undefined 
intermediate  state  if  both  node  a  and  b  have  about  same  amount  of  charges.  In  other  words, 
the  signal  carried  by  the  higher  capacitive  line  or  the  line  with  the  stronger  current  source 
will  establish  itself  at  the  node.  Usually  this  is  considered  to  be  a  risky  practice  and  should 
be  avoid.  In  some  cases,  though,  it  is  very  useful  to  take  advantage  of  charge  sharing,  as  in 
precharged  circuit.  For  example,  a  bns,  which  has  a  huge  capacity  and  therefore  requires 
lots  of  time  to  be  fuUy  charged,  is  sometimes  precharged  during  one  clock  phase  and  shares 
its  charges  with  other  nodes  during  the  other  clock  phase  in  a  two-pha^e  clocking  scheme. 
The  reason  this  organization  can  be  useful  is  that  MOS  transistors  can  discharge  a  bus 
more  quickly  than  they  can  charge  it:  there  is  a  lack  of  symmetry  in  switching  capabilities. 


24 


This  charge  sharing  phenomenon  created  a  new  problem  for  our  switch-level  simulator, 
because  our  assumption  that  a  stronger  signaJ  could  override  any  number  of  weaker  signals 
is  not  so  accurate  in  this  case.  In  the  situation  of  charge  sharing,  several  weaker  signals 
might  override  a  stronger  signaJ  with  opposite  state,  as  long  as  the  sum  of  the  charges 
on  weaker  signals  are  significantly  bigger  than  charges  on  the  stronger  signal.  So  how  do 
we  calculate  steady-state  response  for  nodes  sharing  charges?  A  natural  thing  to  do,  as 
Baker  and  Terman  did  (for  circuit  with  no  A'-transistors)  [BaT  80],  is  to  go  through  the 
connected  nodes  and  sum  up  the  capacitances  for  each  logic  level.  Then,  depending  on  the 
ratio  of  the  total  capacitance  for  one  level  to  sum  of  the  others,  compared  with  a  predefined 
threshold,  nodes  wUl  be  assigned  a  final  state.  In  more  precise  terms,  it  can  be  described  as 
follows.  Let  s^i  be  the  sum  of  strengths  of  1  signals  that  can  reach  node  n,;  let  5°,  be  the 
sum  of  strengths  of  0  signals  that  can  reach  node  n,;  and  let  5^,  be  the  sum  of  strengths 
of  A'  signals  that  can  reach  node  n^.  To  determine  the  final  state  of  a  node  n^  in  a  circuit 
with  no  X-transistors,  accumulate  all  signals  reaching  n^;  and  assign  1  to  n,-  if  the  ratio 
s^i/{s^i  +  s^i  +  s^i)  is  greater  than  the  predefined  threshold;  assign  0  to  the  node  if  the 
ratio  a'*j/(5°j  +  s^i  +  s^i)  is  greater  than  the  predefined  threshold;  and  assign  X  to  the 
node  otherwise.  But,  with  the  presentation  of  X-transistors,  the  problem  becomes  harder. 
Since  we  cannot  try  all  possible  mappings  of  X-transistors  to  {0,1},  one  conservative  and 
reasonable  way  of  doing  this  is  to  let  a  node  to  be  assigned  1  only  if  under  no  possible 
mappings  of  A'-transistors  to  {0, 1}  wiU  result  in  a  0  or  A'  for  that  node.  Similarly,  we  let 
a  node  to  get  0  only  if  under  no  possible  mappings  of  X-transistors  to  {0,1}  will  result 
in  a  1  or  X.  Unfortunately,  this  problem  is  a  nondeterministic  polynomial  (NP)  complete 
problem,  meaning  that  we  have  to  try  all  possible  mappings  of  X-transistors  to  {0, 1}  to 
find  out  if  one  of  them  allows  a  node  to  reach  a  state.  This  explains  why  researchers,  as  in 
[BaT  80],  [Bry  84],  go  back  to  the  general  assumption  that  a  single  stronger  signal  override 
any  number  of  weaker  signaJs,  when  charge  sharing  are  combined  with  A'-transistors. 
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Now  we  prove  the  NP-completeness  of  implementing  the  linear  capacitance  model  for 
charge  sharing,  accompanied  with  the  existence  of  JV-transistors. 

The  charge  sharing  problem  we  have  here  is:  Given  a  circuit,  for  a  particular  node 
n,,  find  a  mapping  of  A'- transistors  to  {0,1}  such  that  the  ratio  s\/(,s°,  +  5^  +  5^,)  (or 
5°,7(s°,  +  5^,  +  s^i))  is  maximized.  If  this  ratio  is  less  than  the  threshold,  then  we  know 
that  no  mapping  will  allow  node  n,  to  get  1  (0).  A  simplified  version  will  be  that  s',  =  0. 
It  is  easy  to  see  that  the  simplified  charge  sharing  problem  is  equivalent  to  the  following 
graph  problem. 

Definition  3.6  Given  an  undirected  graph  G  =  {V,E),  with  vertices  divided  into  two  types: 
that  of  "black"  vertices  (B)  and  that  of  "white"  vertices  (W).  That  is,  V  =  B  U  W .  and 
B  nW  =  9.  Associated  with  each  v  €  V,  there  is  a  weight  w{v)  (positive  real  number).  For 
a  specific  vq  6  V,  the  "b/w"  ratio  problem  for  this  vertex  is  to  find  a  connected  subgraph 
G'  =  {V',E')  such  that  vq  G  V,  and  the  %/w  ratio" 

Ev€V'  H^) 
is  maximum.  Similarly,  for  vq  G  V,  the  %}/b''  ratio  problem  is  to  find  a  connected  subgraph 
G'  =  {V',E')  such  that  vq  G  V,  and  the  "w/b  ratio" 


J2vevw(v) 


(3.3) 


ts  maximum. 


It  is  easy  to  see  that  for  a  "black"  ("white")  vertex  v,  the  obvious  solution  to  the  "b/w" 
("k;/6")  ratio  problem  is  the  subgraph  containing  a  single  node  v  and  having  no  edges. 
But  it  can  be  shown  that  the  "6/1/;"  {"w/b'')  ratio  problem  for  a  "white"  ("black")  node  is 
NP-complete.  To  prove  it,  we  need  the  following  definition. 
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Definition  3.7  Let  Zq^  denotes  the  set  of  positive  integers,  including  zero.  Given  an 
undirected  graph  G  =  {V,E),  a  weight  w{e)  6  Zq"^  for  each  e  £  E,  and  a  subset  R  C  V. 
The  steiner  tree  relative  to  R  is  a  subtree  of  G  that  includes  all  the  vertices  of  R  and  such 
that  the  sum  of  the  weights  of  the  edges  in  the  subtree  is  minimized. 

It  has  been  proved  that  the  problem  of  find  a  steiner  tree  is  NP-complete,  even  if  all 
edge  weights  are  equal  [GaJ  79]. 

Theorem  3.1  The  "b/w"  ratio  problem  for  a  "white"  vertex  is  NP-complete.  Similarly, 
the  "w/b"  ratio  problem  for  a  "black"  vertex  is  NP-complete. 

Proof:  Let's  prove  that  the  '^b/w^  ratio  problem  for  a  "white"  vertex  is  NP-complete. 
The  "'w/b'"  ratio  problem  for  a  "black"  vertex  can  be  proved  similarly.  We  will  reduce 
the  steiner  tree  problem  [GaJ  79]  to  our  "fc/w"  ratio  problem.  Assume  we  could  solve  the 
"b/w"  ratio  problem  in  polynomial  time.  We  want  to  show  that  we  could  find  a  steiner 
tree  in  polynomial  time.  Let  G  =  (V,  E)  be  the  given  graph  in  which  we  want  to  find  a 
steiner  tree,  relative  to  iZ,  where  \V\  =  n.  Let  M  be  the  maximum  of  those  edge  weights. 
Choose  one  of  the  vertices  in  R  to  be  Vq,  and  assign  a  weight  nW  to  it,  where  W  >  n^M . 
Make  vq  a  "white"  vertex,  make  the  rest  of  the  vertices  in  the  graph  be  "black".  Assign 
all  t;  e  i?  the  same  weight  W .  Assign  all  v  e  V  -  iZ  the  same  weight  0.  For  each  e  £  E, 
insert  a  "white"  vertex  with  weight  equal  to  the  weight  of  the  edge  e.  Clearly,  the  weight 
of  any  "white"  vertices  is  less  than  W/n^.  Then  finding  a  steiner  tree  in  the  original  graph 
is  equivalent  to  finding  the  maximum  "6/u;  ratio"  in  the  modified  graph.  All  vertices  in 
R  must  be  included  into  the  tree  since  all  other  "white"  vertices  are  simply  too  small  to 
matter.  Even  at  the  cost  of  including  all  "white"  vertices,  adding  a  single  vertex  in  R  can 
be  beneficial.   On  the  other  hand,  we  want  to  include  as  few  "white"  vertices  as  possible. 

D 
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To  avoid  the  difficulty  of  solving  the  charge  sharing  problem,  we  have  to  accept  the 
less  time-consuniing  as  well  as  less  accurate  approach,  which  is  the  one  we  used  for  general 
cases.  We  again  assume  that  a  stronger  signal  will  override  any  number  of  weaker  signals 
with  opposite  state.  We  would  like  to  point  out  though  that  this  model  is  accurate  enough 
in  practice,  since  in  a  well  designed  circuit,  the  situation  in  which  many  nodes  share  charges 
does  not  occur  very  often.  If  we  choose  the  threshold  to  be  low  enough,  a  stronger  signal 
will  actually  override  several  weaker  signals.  Another  advantage  of  this  model  is  that  the 
steady-state  response  under  this  model  can  be  calculated  in  a  uniform  way  as  in  the  other 
cases.  This  makes  the  time  complexity  of  our  entire  simulator  to  be  linear  to  the  number 
of  nodes  plus  the  number  of  transistors. 

3.2      Functional-Level  Simulation 

In  our  implementation,  the  language  we  choose  for  the  functional-level  circuit  description 
is  a  superset  of  C  programming  language,  called  CHDL.  The  reason  for  choosing  it  is  that 
it  is  easy  to  implement.  Being  written  in  C,  our  switch-level  simulator  is  easy  to  link  with 
other  C  programs.  But  the  simulator  has  been  designed  so  that  other  hardware  description 
languages,  such  as  VHDL,  could  be  added  later  without  much  difficulty. 

3.2.1      Definition  of  CHDL 

First  of  aJl,  CHDL  is  a  superset  of  C  programming  language.  All  the  powers  of  C  can  be 
used  in  describing  behaviors  of  a  cinmit.  Beyond  that,  we  have  introduced  the  follov^ing 
keywords. 

bit  declares  variables  to  be  nodes.  A  variable  of  type  bit  could  have  a  value  of  either  0,  1, 
or  A'. 
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byte  declares  a  variable  to  be  a  list  of  8  nodes.     A  variable  x  of  type  byte  could  be 
equivalently  declared  as  bit  x  [8]  ; 

cell  declares  the  identifier  following  it  to  be  the  name  of  the  current  cell. 

in  declares  variables  to  be  input  ports  of  the  current  cell. 

map  indicates  the  start  of  a  new  statement,  called  map  statement.  The  format  of  a  map 
statement  will  be  described  later. 

out  declares  variables  to  be  output  ports  of  the  current  cell. 

simulate  tells  the  simulator  to  simulate  the  cell  at  the  level  specified  by  the  following 
identifier,  such  eis  C  or  MAGIC. 

state  declares  variables  to  be  nodes  that  have  sufficient  amount  of  capacities  to  remember 
its  state  across  function  calls  during  simulation. 

use  declares  a  cell  or  an  array  of  cells  as  subcells  of  the  current  cell.  Only  cells  so  declared 
could  be  used  in  the  map  statement. 

Several  copies  of  the  same  subcell  could  appear  simultaneously  in  a  cell.  They  could  be 

organized  as  an  one  or  two  dimensional  array  if  appropriate.  This  wiU  allow  us  to  access 

them  through  index,  and  hence  taking  advantages  of  C  loop  statement.    If  these  copies 

are  unrelated,  though,  they  can  be  included  separately.    Each  copy  wiU  be  eissigned  an 

occurrence  number  to  distinguish  from  one  another. 
The  format  of  a  map  statement  is  as  follows: 

map  subcell-aame (input -pair 1,  input-pair2,    ;  output-pairl ,   eutput-pair2 ,    ...); 

In  the  above  map  statement,  the  subcell-name  indicates  an  instance  of  a  subcell,  and  has 
the  form  of  either  name  [occur]  ,  name  [occxir]  [indexl]  or  name  [occur]  [indexl]  [index2]  , 
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where  occvir,  indexl  and  index2  are  all  integers.  The  occur  is  the  occurrence  number, 
indicating  the  instantiation  of  the  subcell  in  the  current  cell;  and  indexl  and  index2  are 
indices  if  a  subcell  has  many  copies  organized  as  one  or  two  dimensional  array.  Each 
input-pairs  has  the  form  "port-name,  value",  where  "port-name"  is  an  input-port  name 
of  the  subcell,  and  "value"  is  the  input  value  to  that  input  port.  Orders  of  these  pairs  are 
not  important.  Each  output-pairs  has  the  form  "port-name,  var",  where  "port-name"  is 
an  output-port  name  of  the  subcell.  and  "var"  is  a  variable  whose  value  should  be  set  to 
that  of  the  corresponding  output-port  after  simulation  of  the  subcell. 

3.2,2      An  Example 

Here  is  an  example  of  using  C  programming  languages  to  describe  a  16-bit  parallel  load 
register: 


cell  REG; 
simulate  C; 

/*  Main  function  lor  subcell  REG:  16-bit  Peurallel  Load  Register. 

*/ 
REGCphil,  phi2,  SI,  S2,  Lin,  Rin.  A) 
in  bit  phil,  phi2; 
in  bit  SI,  S2; 
in  bit  Lin,  Rin; 
in  bit  A  [16]  ; 
{ 

out  bit  Bri6],  Lont,  Rout; 

state  R[18];        /*  Variables  of  type  "state"  are  static;  they  keep 

•  their  values  through  across  consecutive  calls. 
*/ 
int  i; 

/•  During  phil,  things  like  loading  and  shifting  will  occur.  •/ 

if  (phil  ==  1) 

i. 

if  (SI  ==  0  tt   S2  ==  0)  /•  hold  •/ 

/•  do  nothing  ♦/; 
else  if  (SI  ==  0  Aft  S2  ==  1)        /*  shift  left  */ 
{ 

R[0]  =  Rin; 

for  (i  =  17;  i;  i— ) 
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R[i]   =  R[i-l]; 
> 

else  il    (SI  ==   1  ftft  S2  ==  0) 
•C 

R[17]    =  Lin; 

lor   (i  =   0;    i  <   17;    i++) 
R[i]    =  R[i+1]; 
> 

else  il    (SI  ==   1  ftft  S2  ==   1) 
{ 

lor  (i  =  0;  i  <  16;  i++) 
RCi+1]  =  ACi]; 
} 

/♦  Invalidate  outputs.  */ 
for  (i  =  0;  i  <  16;  i++) 

B[i]  =  X; 
Rout  =  X; 
Lout  =  X; 

return; 
} 

/•  Validate  the  outputs .    */ 
lor   (i  =  0;    i  <   16;    i++) 

B[i]    =  RCi+1]; 
Rout  =  RCO] ; 
Lout  =  R[17]; 


/♦   shilt  right   ♦/ 


/*  parallel  load  */ 


In  the  above  program,  the  cell  name  is  REG,  and  it  has  input  ports  phil,  phi2,  SI,  S2, 
Lin,  Rin,  and  A  [16]  .  Its  output  ports  are  B  [16] ,  Lout  and  Rout.  The  cell  will  be  simulated 
at  functional  level,  as  indicated  by  the  simulate  clause. 

3.2.3     The  Preprocessor 

For  each  subcell  described  at  functional-level,  our  preprocessor  will  parse  the  C  program, 
and  produce  another  C  program  while  making  necessary  changes,  generate  the  .sim  file 
from  the  extracted  version  of  Magic  layout  by  calling  ext2sim,  modify  the  .sim  file  by 
appending  uses  of  subcells  described  at  functional-level  to  it,  and  finally  compile  aU  the  C 
programs  produced,  and  link  them  with  the  simulator.  If  a  module  has  more  than  one  kind 
of  description,  e.g.  both  C  program  and  Magic  layout,  the  user  hcis  a  choice  of  simulating 
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C  program  for  speed  or  simulating  Magic  layout  for  accuracy.  This  is  done  with  a  change 
of  a  single  word  in  the  C  program. 

The  C  parser  is  written  by  using  the  Unix  parsing  program  generator  Yacc  and  the 
lexical  analysis  program  generator  LEX. 

3.3      Interconnection  Between  Levels 

The  primary  advantage  of  our  multi-level  simulator  is  that  cells  in  different  abstract  levels 
can  be  simulated  together.  For  instance,  in  our  current  implementation,  a  cell  can  contain 
any  mixture  of  Magic  layouts,  schematic  layouts,  and  C  functions,  as  subcells.  But  how 
could  a  cell  be  included  in  another  cell  as  a  subcell? 

This  is  straight  forward  if  two  cells  are  both  Magic  cells,  or  both  are  represented  in 
schematic  format.  Both  Magic  and  the  schematic  layout  editor  provide  commands  to  include 
one  cell  into  another  cell  as  subcells.  In  fact,  the  schematic  layout  editor  allows  you  to 
include  cells  in  any  representations  as  subcells. 

If  a  Magic  cell,  say  B,  contains  a  C  cell,  say  A,  as  its  subcell,  we  can  build  a  Magic  cell 
for  A  containing  nothing  but  input  and  output  ports,  and  can  use  it  as  a  complete  Magic 
subcell  in  B.  At  simulation  time,  the  corresponding  C  function  for  A  wiU  be  called  with 
current  inputs,  and  its  outputs  wiU  be  produced  by  the  C  function,  and  the  simulation  of 
cell  B  continues. 

A  ceU  described  in  CHDL  conid  include  other  cells  as  snbcdis,  regardless  of  thrir  rep- 
resentations,  through  map  statement.  But  we  encourage  designers  to  use  schematic  layout 
for  ceUs  containing  only  subcells  and  their  interconnections,  since  schematic  layout  is  both 
easier  to  create  and  faster  to  be  simulated.  But  if  the  cell  has  other  logic  described  in 
CHDL  besides  subcells,  the  CHDL  representation  has  to  be  adopted. 
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3.4      A  Design  Example 

Msim  is  an  interactive,  event-driven,  multi-level  simulator  for  nMOS  and  CMOS  transistor 
circuits.  It  accepts  commands  from  the  user,  executing  each  command  before  reading  the 
next. 

The  first  step  in  using  the  simulator  is  to  construct  a  circuit  in  either  layout  level  or 
functional  level,  or  both.  The  preprocessor  must  then  be  called  to  generate  the  .sim  file, 
which  is  read  to  build  the  network  in  memory. 

The  next  step  usually  involves  setting  input/output  ports,  defining  clocks,  and/or 
precharge  some  of  the  nodes.  After  input  values  have  been  established,  their  effect  can 
be  propagated  through  the  network  with  commands  described  as  follows.  The  step  com- 
mand allows  input  values  to  propagate  through  the  network  for  n  (default:  1)  time  units. 
The  cycle  command  cycles  n  (default:  1)  times  through  the  clock,  which  is  defined  by  the 
clock  command.  For  each  phase  of  the  clock,  the  network  is  simulated  until  it  stablizes. 
The  run  command  have  the  same  effect  as  the  step  command  does,  except  that  it  runs 
until  the  network  stablizes.  The  cent  command  continues  the  simulation  for  n  (default: 
infinite)  cycles.  If  n  is  not  specified,  the  simulation  will  not  stop  until  the  network  stablizes. 

During  the  simuLatioar  nodes  could  be  traced  by  the  command  trace.  Every  time  the 
state  of  a  traced  node  is  changed,  the  change  will  be  reported  to  the  user.  States  of  nodes 
could  also  be  displayed  at  any  time  with  the  command  print. 

Let's  design  a  4-bit  binary  ccwnit«-from  the  be^nning.  Here  are  a  few  things  that  have 
to  be  done  first: 

•  Choose  a  name  for  our  counter,  say  CTR\ 

•  Decide  the  functionality  of  the  counter:  increment  every  clock  cycle  if  the  control  line 
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is  high; 

•  Name  the  input /output  ports,  say  CZ,  A' and  nCLK  for  clocks,  T  for  the  control  line, 
and  C[4]  for  the  four  output  bits; 

Then  we  are  ready  to  write  done  the  description  of  our  counter  at  functional  level.    Our 
first  version  may  look  like  this: 

cell  CTR;  /♦  a  binary  counter  */ 

bit  T; 

bit   CLK,    nCLK; 

bit   C[4]  ; 

simulate  C; 

/♦ version  1 */ 

CTR(T,   CLK,   nCLK) 

in  bit  T.    CLK,   nCLK; 
■C 

out  bit  C[4] ; 

state  unsigned  count  =  0; 

int  i; 

il    ((T  ==   1)   tk   (CLK  ==   1)) 

count++; 
lor   (i  =  0;    i  <  4;    i++) 

C[i]    =   ((count  t   (01   «  i))   ?   1    :    0); 


Note  that  the  above  description  contains  no  information  whatsoever  about  how  the 
counter  is  going  to  be  implemented.  AU  you  need  is  a  precise  description  that  runs  fast.  Of 
course,  if  this  was  a  larger  circuit,  you  might  very  well  want  to  simulate  it  at  this  time.  In 
the  next  version,  we  begin  considering  how  this  counter  will  be  constructed.  How  about  four 
7*  flip-flops  connected  together  serially.  The  following  is  the  description  of  our  T  flip-flop 
ait  functional  level. 

cell  TFLIP;  /*  T  type  flip-llop  ♦/ 

bit  T; 

bit   CLK,   nCLK; 

bit   Q; 

simulate  C; 
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TFLIPCT.  CLK.  nCLK) 

in  bit  T,  CLK,  nCLK; 

i 

out  bit  Q; 
state  bit  S   =   0; 

il    ((T  ==  1)   tk   (CLK  ==   1)) 

if   (S  ==  0) 

S  =   1; 
else   if    (S  ==    1) 

S  =  0; 
else 

S  =   X; 
Q  =   S; 


It  is  easy  to  see  that  the  name  of  our  T  flip-flop  is  TFLIP,  and  it  has  a  control  line  T 
and  two  clocks  CLK  and  nCLK  as  inputs;  and  Q  as  output. 

Having  our  T  flip-flop  as  a  subcell,  the  second  version  of  out  counter  may  contains  more 
information  on  how  the  counter  is  constructed: 

cell  CTR;  /*  a  binary  counter  */ 

bit  T; 

bit   CLK,   nCLK; 

bit   C [4] ; 

use  TFLIP  [4]; 

simulate  C; 

/♦ version  2 •/ 

CTRCT,   CLK.   nCLK) 

in  bit  T,   CLK,   nCLK; 
■C 

out  bit  C [4] ; 

state  unsigned  count  =  0; 

bit  carry [33; 

carry [0]    =    (T  ft  C[0]); 

carry  [1]    =    (T  ft  C[0]    ft  C[l] ) ; 

carry [2]    =   (T  ft  C[0]    ft  C[l]    ft  C[2]); 

map  TFLIP [0]  [0]  (T.    CLK.   nCLK;    C[0]); 

map  TFLIP [0][1]  (carry[0],   CLK.   nCLK;    C[l]) 

map  TFLIP[0][2]  (carryCl],   CLK,   nCLK;    CC2]) 

map  TFLIP [0]  [3]  (carry [2],   CLK,   nCLK;    CC3]) 
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Again,  if  the  circuit  was  larger,  you  should  simulate  it  to  find  out  any  possible  error  in 
the  specification  or  in  the  interconnection.  You  should  simulate  the  whole  counter,  but  you 
could  aJso  simulate  the  T  flip-flop  aJone.  After  the  verification,  we  proceed  refining  our  T 
flip-flop,  and  produce  the  geometrical  layout  using  Magic. 

To  verify  the  correctness  of  our  layout,  we  can  now  simulate  CTR  at  functioned  level 
while  simulating  the  T  flip-flop  at  switch-level.  The  only  modification  needed  for  this 
purpose  is  to  changed  the  line  simulate  C  to  simulate  MAGIC  in  the  description  file  for 
TFLIP  and  use  the  second  version  of  the  CTR.  Finally,  the  whole  circuit  of  CTR  is  laid  out 
using  Magic,  and  the  whole  thing  be  simulated  at  switch-level.  Its  schematic  representation 
is  shown  in  Fig.  3.3. 
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Figure  3.3:  Schematic  representation  of  a  4-bit  binary  counter 

This  is  only  a  very  small  example,  and  sometimes  you  may  think  the  descriptions  at 
functional-level  is  unnecessary.  It  is  not  the  case  if  you  have  a  much  larger  circuit.  In  fact, 
you  may  need  more  than  one  version  of  functional-level  descriptions,  such  as  those  for  CTR. 
They  are  not  redundant  in  the  sense  that  the  more  abstract,  non-constructive  version  of 
the  description,  usually  produced  at  the  beginning  of  the  design,  may  still  be  useful  at  later 
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stage.  After  verifying  the  correctness  of  the  circuit,  these  non-constructive  versions  could 
be  used  for  faster  simulation. 


37 


Chapter  4 


Automatic  Placement 


Automatic  layout  problem  has  been  one  of  the  most  difficult  as  well  as  the  most  important 
problems  encountered  in  the  process  of  physical  design  of  VLSI  circuits.  Many  attentions 
have  been  attracted  to  this  field  in  the  last  decade.  It  consists  of  two  primary  functions: 
mapping  the  circuit  elements,  which  we  wiU  refer  to  as  basic  modules,  onto  locations  on  the 
layout  surface,  and  interconnecting  them  according  to  a  set  of  design  rules  [MeC  80].  The 
former  is  called  placement,  and  the  later  is  czdled  routing.  Although  these  two  steps  are 
highly  related,  we  will  concentrate  on  the  placement  problem  only. 

Good  placement  is  a  key  step  of  automatic  layout.  It  determines  how  much  area  the 
circuit  will  occupy,  and  the  longest  length  as  well  as  the  total  length  of  the  interconnection 
wiring,  which  in  turn  limits  the  performance  of  the  circuit.  A  poor  placement  could  even 
make  routing  impossible.  Although  this  suggests  that  the  two  steps  should  be  done  at  the 
same  time,  and  some  researchers  are  actually  doing  that  [Loo  79],  the  usuzJly  way  is  to 
consider  routing  as  one  of  the  key  objectives  while  performing  the  automatic  placement. 
The  actually  routing  is  done  after  the  placement.  One  of  the  other  important  objective  is 
to  minimize  the  area  taken  by  the  modules. 

In  this  part  of  the  thesis,  we  will  talk  about  automatic  placement  problems  in  custom 
design  style,  although  some  of  the  ideas  could  also  be  used  in  other  design  styles. 
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4.1      The  Placement  Problem 

The  placement  problem  consists  of  determining  locations  for  basic  modules  in  a  circuit  on 
the  layout  surface.  Typically,  a  basic  module  represents  a  functional  unit  of  the  logic  design 
such  as  an  adder,  a  register,  etc.  ,  but  in  practice  it  could  be  anything.  The  input  of  the 
placement  problem  is  a  complete  specification  of  a  circuit's  logical  design  and  a  number  of 
requirements  on  the  final  layout.  Here  are  some  of  the  inputs: 

•  A  set  of  basic  modules; 

•  Shapes  and  aspect  ratio  allowances  of  each  basic  modules; 

•  List  of  nets  that  interconnect  basic  modules; 

•  Aspect  ratio  requirement  for  the  final  layout; 

Net  is  another  term  for  wire,  which  could  be  formally  defined  as  a  set  of  basic  modules.  A 
net  n  =  {m,j ,  m,j , . . . ,  m,, }  represents  a  wire  that  connects  basic  modules  m,, ,  m^j , . . . ,  m,, 
all  together.  In  this  thesis,  for  simplicity,  we  only  deal  with  nets  that  connect  exactly  two 
modules  together.  If  the  needs  of  nets  connecting  three  or  more  modules  arise,  it  could  be 
replaced  by  a  set  of  nets  each  connects  only  two  modules.  So  the  above  net  n  could  be 
replaced  by  a  set  of  nets: 

Another  possible  input  is  a  list  of  ports,  which  forms  the  connection  channel  to  the 
outside  circuitry.  Usually  placement  procedure  is  responsible  for  determining  locations  of 
these  ports,  but  it  is  not  discussed  in  this  thesis.  Instead,  we  assume  that  all  ports  are 
located  at  the  "wiring  center"  (which  wiU  be  defined  later)  of  a  module. 

The  goal  of  placement  is  to  position  basic  modules  on  layout  surface  such  that  they  do 
not  overlap  with  each  other,  allow  routing  to  be  done  without  violating  design  rules,  while 
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optimizing  any  number  of  (usually  conflicting)  objectives  imposed  by  the  designer.  Some 
common  objectives  include  minimizing: 

•  layout  area; 

•  wire  length  (longest  or  sum  of); 

•  power  dissipation; 

•  delay  propagation; 

•  aspect  ratio; 

•  a  weighted  sum  of  those  above. 

Unfortunately,  placement  problem  with  any  one  of  the  above  as  objective  function  results 
in  an  NP-hard  problem  [Don  80],  [SaB  80].  Hence  all  attempts  at  solving  the  placement 
problem  are  heuristics. 

4.2      Previous  Works 

Many  heuristics  have  been  used  in  solving  placement  problem.  One  of  the  most  popular 
method  is  the  partitioning-based  placement  [Lau  80],  [She  89],  [BHS  88].  The  partitioning- 
based  placement  method  essentially  consists  of  a  tc^Mlown  mincnt  process,  which  divides 
both  the  layout  surface  and  the  set  of  modules  into  two  halves.  It  then  try  to  place  one 
half  of  the  modules  on  one  half  of  the  surface,  and  the  other  half  of  modules  on  the  other 
half  of  the  surface,  recursively.  The  resulting  floorplan  is  a  sL'cing  structure. 

Another  popular  placement  algorithm  is  simulated  annealing.  Starting  from  an  initial 
(complete)  placement,  it  trys  to  find  a  better  one  by  making  a  slight  change  to  the  previous 
placement.  By  allowing  transformation  to  a  worse  configuration  occasionally,  it  avoids  being 
trapped  in  local  optima,  hence  capable  of  finding  global  optimal,  at  least  in  theory.  Many 
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automatic  layout  systems  have  adapted  this  method  or  its  variations  [MaG  88],  [WoL  86], 
[Gro  87],  [KIB  87],  [GrS  84]. 

Force- directed  placement  follows  a  totally  different  way  [For  87],  [HWA  76].  It  solves 
the  placement  problem  by  applying  a  physical  model,  where  each  pair  of  connected  basic 
modules  is  considered  to  be  joined  by  a  spring  whose  stiffness  is  proportional  to  the  number 
of  connections  between  the  modules.  Then  a  simulation  is  done  in  which  modules  are 
allowed  to  move  in  response  to  the  spring  forces. 

Noticing  the  amount  of  time  consumed  by  previously  used  methods  such  as  simulated  an- 
nealing method,  the  bottom-up  iterative  improvement  technique  was  introduced  [MWL  87]. 
In  this  method,  a  layout  is  represented  by  a  binary  tree  with  leaves  representing  basic  mod- 
ules and  internal  notes  specifying  ways  to  combine  two  portions  of  a  layout.  The  idea  is  to 
iteratively  improve  the  current  layout  in  a  bottom-up  fcishion  from  the  leaves  to  the  root  of 
the  tree  representing  the  layout.  At  each  node,  a  set  of  heuristic  rules  are  applied  to  find 
possible  improvements.  Our  automatic  placement  system,  which  is  going  to  be  described 
in  details  in  the  next  chapter,  adapted  this  technique  as  its  main  algorithm. 
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Chapter  5 


The  Placement  Generator 


As  part  of  our  integrated  system,  a  placement  generator  was  developed.  It  is  an  interactive 
placement  tool  that  could  handle  Z,-shape  modules  as  well  as  rectangular  shape  modules.  It 
has  adopted  similar  method  as  used  in  [WoL  87]  to  deal  with  Z,-shape  modules.  The  input  of 
the  placement  generator  consists  of  a  complete  logical  specification  of  the  circuit  and  some 
parameters  that  are  part  of  objective  functions  to  be  optimized.  Most  of  the  parameters 
can  be  changed  interactively.  The  algorithm  used  here  is  a  mixture  of  bottom-up  iterative 
improvement  technique  [MWL  87]  and  the  simulated  annealing  technique.  While  bottom- 
up  iterative  improvement  technique  serves  as  the  main  part  of  the  procedure,  simulated 
annealing  technique  is  only  employeed  to  enable  \is  to  jump  out  of  locaJ  optimals. 

5.1      Layout  Representations 

In  this  section,  we  will  describe  how  a  layout  is  represented  in  the  computer,  what  are  the 
inputs  to  the  placement  generator,  and  how  the  quality  of  a  layout  is  measured. 

5.1.1      Circuit  Specification 

Circuit  specification  as  part  of  the  inputs  of  the  placement  generator  consists  of  a  set  of  n 
basic  modules  named  mi,m2,. . .  ,»n„.  Each  module  is  given  an  orientation  which  indicates 
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the  shape  of  the  module.  Rectangular  shape  modules  have  orientation  of  0;  L-shape  modules 
have  orientations  from  1  to  4,  as  shown  in  Fig.  5.1. 


Figure  5.1:  Orientations  of  modules 

Dimensions  are  also  provided  for  each  module,  which  are  fixed  throughout  the  automatic 
placement  process.  We  define  the  wiring  center  of  a  module  to  be  the  gravity  center  of 
its  bounding  box.  It  is  easy  to  see  that  for  a  rectangular  module,  the  wiring  center  so 
defined  is  just  the  center  of  its  gravity,  while  for  an  L-shape  module,  the  wiring  center  is 
the  center  of  its  bounding  box.  Wiring  centers  are  used  to  compute  approximations  of  wire 
lengths  in  the  layout.  The  length  of  a  wire  from  module  m,  to  module  tuj  is  approximated 
to  the  Manhattan  distance  from  wiring  center  of  module  m,  to  that  of  module  ruj  in  the 
placement  generator.  That  is  to  say,  all  pins  of  a  module  are  considered  to  be  at  its  wiring 
center. 

Modules  are  allowed  to  be  rotated  by  multiple  of  90°,  flipped  along  X-axis  or  F-axis, 
or  any  combination  of  them  during  the  automatic  placement. 

Another  part  of  the  circuit  specification  is  an  nx  n  interconnection  matrix  C  =  (c,j)nx7ii 
where  Cjj  are  natural  numbers.  Here  c,j  is  the  wiring  density  between  module  m,  and  module 
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5.1.2     Placement  Tree 

We  represent  a  layout  by  a  binary  tree  called  placement  tree.  Each  leaf  in  a  placement 
tree  corresponds  to  a  basic  module.  Each  internal  node  represents  a  super-module,  which 
consists  of  basic  modules  belonging  to  the  subtree  rooted  at  the  internal  node.  Super- 
modules  could  also  be  of  rectangular  shape  or  one  of  the  four  L-shapes,  just  like  basic 
modules.  They  could  also  be  rotated  by  multiple  of  90°,  flipped  along  X-axis  or  F-axis,  or 
any  combination  of  them  during  the  automatic  placement.  The  placement  tree  is  just  a  way 
of  specifying  relative  positions  of  modules.  Fig.  5.2  shows  a  layout  and  its  corresponding 
placement  tree. 
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(a)  Non-slicing  structure 


(b)  Placement  tree 


Figure  5.2:  A  non-slicing  structure 

The  layout  of  a  super-module  is  determined  completely  by  the  operators  associated 
with  each  node  in  the  subtree.  Two  opera.tors  are  associated  with  each  internal  node,  one 
binary  and  one  unary.  Only  one  operator,  an  unary  operator,  is  associated  with  each  leaf. 
There  are  four  different  binary  operators:  Xj,  X2t  -l-i,  +2;  and  there  are  five  different  unary 
operators:  ®i,  ®2,  ®3i  ®r,  ®y  Roughly  speaking,  Xi  and  Xj  allow  two  modules  to  be 
positioned  side  by  side,  while  +1  and  -I-2  allow  two  modules  to  be  positioned  one  on  top 
of  the  other,  (gii,  ®2  and  (83  rotate  the  module  90°,  180°  and  270°  respectively,  while  ®x 
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and  Qy  flip  the  module  along  the  X-axis  and  y-axis  respectively.  See  Appendix  B  for  their 
detailed  definitions. 

Due  to  the  ability  of  handling  i-shape  modules,  certain  layout  which  would  be  impos- 
sible if  slicing  structure  is  used,  can  be  easily  represented.  Fig.  5.2  gives  such  an  example. 

5.1.3      Cost  Function 

Let  dij  be  the  Manhattan  distance  between  the  wiring  centers  of  basic  modules  i  and  j, 
1  <  ii  j  <  "•  The  objective  function  optimized  by  the  placement  generator  has  the  following 
format: 

C  =  aA  +  f3W  +  fR 

where  A  is  the  area  taken  by  the  layout,  W  is  the  total  wire  length  given  by  X!i<,_j<„  Cijdjj, 
and  R  is  the  aspect  ratio  of  the  layout.  The  three  constants,  a,  /3,  and  7  can  be  changed 
interactively  during  the  automatic  placement  process.  For  instance,  by  choosing  a  =  1, 
/3  =  0  and  7  =  0,  only  the  area  taken  by  modiiles  will  be  minimized. 

There  is  a  switch  that  let  the  designer  to  choose  to  optimize  the  longest  wire  length 
instead  of  the  total  wire  length  in  the  layout.  So  W  in  (5.1)  could  also  represent  the  longest 
wire  length  in  the  layout  if  desired. 

The  cost  of  the  layout  may  be  changed  every  time  we  make  an  adjustment  to  the 
placement  tree.  This  is  because  the  placement  tree  determines  locations  of  one  module 
relative  to  that  of  the  others.  The  time  needed  to  compute  total  area  A  is  constant,  since 
at  each  node  in  the  placement  tree,  we  kept  the  dimensions  of  the  module  represented  by 
the  node.  For  the  same  reason,  the  aspect  ratio  R  can  also  be  computed  in  constant  time. 
Time  Oin)  is  needed  to  compute  total  wire  length  W,  though.  This  is  because  after  each 
adjustment,  we  have  to  walk  through  the  placement  tree  and  recompute  wiring  centers  of 
all  the  basic  modules.  Then  the  terms  Cijdij  are  calculated  agjiin,  where  either  one  or  both 
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of  the  wiring  centers  of  the  basic  modules  m,  and  rrij  may  have  been  changed. 

5.2  Outline  of  the  Algorithm 

The  algorithm  starts  by  using  cluster  growth  method  to  construct  an  initied  placement  tree. 
The  objective  of  this  initial  placement  is  to  construct  a  placement  tree  that  combines  the 
mostly  connected  modules  first.  In  the  next  step  it  loops  until  some  termination  conditions 
are  satisfied.  During  each  iteration  the  algorithm  does  a  post-order  walk  through  the 
current  placement  tree,  which  we  call  bottom-up  refinement  procedure.  At  each  node,  aU 
possible  operators  are  tried  and  the  one  that  results  in  the  best  overall  layout  is  chosen  for 
the  node.  At  the  end  of  each  iteration,  if  a  better  layout  is  found,  save  it  somewhere  and 
display  it  on  the  screen;  otherwise,  the  simulated  annealing  procedure  is  called,  hoping  to 
be  able  to  jump  out  of  the  local  optimal.  At  any  time,  the  designer  is  free  to  interrupt  the 
automatic  placement  process,  and  editing  the  current  layout  by  hand  though  the  graphics 
interface,  and  then  resume  the  placement  process. 

In  the  following  sections,  we  will  discuss  each  of  the  procedures  in  details. 

5.3  Initial  Placement 

Let's  define  the  connectivity  between  two  (possibly  super)  modules  A'  and  Y  to  be: 

where  i  G  X  means  for  all  basic  modules  m^  that  is  part  of  the  super-module  A',  and  j  £  Y 
has  similar  meaning.  The  initial  placement  tree  is  constructed  iteratively  by  combining 
smaller  trees  into  bigger  ones.  Initially  each  basic  module  is  in  a  separate  tree  consisting 
the  module  itself.  Then  the  algorithm  iterate  until  the  forest  becomes  a  single  tree.  During 
each  iteration,  two  trees  in  the  forest  are  found  such  that  the  connectivity  between  them 
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is  the  maximum.  These  two  trees  are  then  replaced  by  a  single  tree  which  is  obtained  by 
combining  the  two.  The  binary  operator  at  the  root  is  chosen  randomly,  and  no  unary 
operator  is  given  to  the  root  at  this  time.  Some  kind  of  heuristics  could  be  used  here  to 
choose  good  binary  or  unary  operators  for  the  root,  but  this  is  easily  done  though  the  first 
iteration  of  our  bottom-up  refinement  procedure. 

The  time  complexity  of  this  step  is  0{n^),  where  n  is  the  number  of  basic  modules  in 
the  circuit. 

5.4      Bottom-Up  Refinement 

In  this  phase  of  the  algorithm,  we  walk  through  the  placement  tree  in  post  order.  For  each 
node  encountered,  try  all  possible  operators,  binary  (internal  node  only)  and  unary,  and 
find  the  one  that  results  in  the  best  overall  layout.  Binary  operator  and  unary  operator 
can  exist  at  the  same  time.  So  trying  a  new  binary  operator  means  trying  it  with  the  old 
unary  operator.  Similarly,  trying  a  new  unary  operator  means  trying  it  with  the  old  binary 
operator.  Replace  the  old  operator  with  the  new  one  if  they  are  different.  After  we  have 
completed  the  whole  walk,  keep  track  of  the  best  layout  found  so  far;  or  if  nothing  has  been 
improved,  we  call  the  simulated  annealing  procedure,  which  hopefully  will  make  the  next 
iteration  a  fruitful  one  by  slightly  'hnessing  up"  the  current  layout. 

Although  there  can  be  at  most  one  unary  operator  associated  with  each  node  in  the 
placement  tree,  aevetal  uiLuy  opera.tors  can  be  applied  to  a  node,  having  the  effect  of 
applying  all  of  t6em,  one  after  another.  The  way  of  doing  it  is  to  apply  the  first  unary 
operator  to  the  node,  and  changed  the  orientation  and  dimensions  of  the  modules  within 
the  two  children  of  the  node,  so  that  it  looks  just  the  same  as  if  the  fist  operator  has 
been  applied.  Eliminate  the  first  operator  associated  with  the  node.  Now  we  have  space 
to  apply  the  rest  of  the  unary  operators,  in  the  same  way.  This  makes  application  of  any 
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combination  of  unary  operators  possible. 

During  each  iteration,  we  visit  each  node  in  placement  tree  exactly  once.  We  then  spend 
0{n)  time  at  each  node,  due  to  the  computation  of  the  cost  function.  So  each  iteration, 
i.e.  each  bottom-up  refinement  step,  takes  O(n^)  time. 

5.5      Simulated  Annealing 

Every  time  when  the  bottom-up  procedure  is  completed,  and  nothing  has  been  improved, 
this  simulated  annealing  procedure  will  be  called.  The  purpose  of  this  procedure  is  to  break 
out  the  local  optimal  so  that  the  next  time  bottom-up  procedure  is  caUed,  hopefully  some 
improvement  wiD  be  achieved. 

The  simulated  annealing  procedure  proceeds  by  first  randomly  choose  a  node,  internal 
or  leaf,  from  the  current  placement  tree.  It  then  randomly  choose  an  operator  for  this 
node.  Besides  those  binary  and  unary  operators,  two  more  operators  may  be  applied  to  an 
internal  nodes: 

Si:  Swap  the  two  children; 

52:  Swap  one  of  its  child  with  one  of  its  grand-child; 

After  choosing  the  operator,  replace  the  old  operator  with  the  new  one,  or  perform  the 
swap  operation  if  it  is  Si  or  52.  If  this  results  in  a  better  overaU  layout,  make  the  change 
permanent;  otherwise,  still  make  the  change  permanent  with  probability: 

g-AC/T 

where  AC  is  the  difference  in  costs  of  the  old  and  new  layout,  and  T  is  the  temperature 
which  increases  as  more  consecutive  fruitless  bottom-up  iterations  have  been  experienced. 

This  procedure  will  be  repeated  a  number  of  times  before  we  rLiurn  to  the  bottom-up 
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refinement  procedure.  The  exact  number  of  iterations  is  changing  according  to  how  well  we 
have  done  in  previous  bottom-up  refinement  procedures,  just  like  temperature  does.  But 
the  range  of  this  number  is  usually  small,  say  from  20  to  30.  K  we  consider  this  number 
to  be  a  constant,  then  the  time  spend  on  simulated  annealing  procedure  is  0{n),  which  is 
needed  to  compute  the  cost  function. 

5.6  Graphics  Interface 

A  goal  of  the  most  automatic  placement  tools  is  to  free  designers  from  tedious  details  of 
physical  VLSI  circuit  design.  Due  to  the  hardness  of  the  placement  problem,  no  matter 
how  good  a  automatic  placement  tools  is,  there  are  some  cases  where  the  program  just 
can't  find  a  reasonably  good  layout  within  a  reasonably  short  period  of  time.  Keeping  this 
in  mind,  we  have  included  a  graphics  editing  function  in  our  system,  which  allows  user  to 
move  modules  by  hand  from  one  place  to  another,  rotate  them,  or  flip  them.  Users  can  stop 
the  automatic  placement  process  at  any  time,  edit  the  current  layout  by  hand,  and  then 
continue  the  automatic  placement  from  the  new  configuration.  This  human  intervention 
could  sometimes  save  a  lot  of  time. 

5.7  An  Example 

We  have  run  the  automatic  placement  generator  on  the  16-bit  CPU  designed  as  an  experi- 
ment under  our  CAD  environment.  Fig.  6.1  shows  the  schematic  layout  produced  by  hand. 
And  Fig.  5.3  shows  the  result  generated  by  the  automatic  placement  generator,  without 
any  human  intervention.  Note  that  all  ports  are  considered  by  our  placement  tool  to  by 
located  at  the  center  of  each  module.  Also,  no  routing  space  is  left  around  modules;  the 
modules  should  contain  routing  spaces  themselves. 
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Figure  5.3:  Placement  of  a  CPU  generated  by  the  placement  generator 
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Chapter  6 

The  Schematic  Layout 
Representation 


Currently,  there  are  two  levels  at  which  a  circuit  could  be  specified  in  order  for  it  to  be 
sinmlated  under  our  environment,  the  CHDL,  or  the  Magic  layout  format.  As  another  level 
between  these  two,  we  have  developed  the  schematic  layout  representation.  While  CHDL 
is  used  primarily  for  behavioral  descriptions  of  circuit,  the  schematic  layout  representation 
is  intented  for  structural  descriptions.  From  designer's  point  of  view,  schematic  layouts  are 
much  easier  to  create  than  Magic  layout;  it  also  makes  the  creation  of  HDL  descriptions 
much  easier.  ActuaDy,  the  creation  of  a  schematic  layout  of  what  a  designer  wants  to  design 
is  a  good  starting  point.  It  provides  the  designer  with  a  general,  clear  picture  of  the  whole 
circuit  in  a  very  short  period  of  time. 

One  of  the  biggest  problems  with  hardware  description  languages  is  the  specification  of 
interconnections  between  modules.  If  a  circuit  contains  lots  of  modaies^cach  with  a  number 
of  ports,  it  is  very  difficult  to  specify  which  port  are  connected  to  which.  Conventional 
methods,  such  as  representing  a  net  as  a  list  of  port  names,  suffer  the  problems  like  hard 
to  create,  hard  to  read,  and  fragile  to  modifications.  With  the  help  of  a  graphics  editor, 
interconnections  between  modules  could  be  specified  and  modified  by  cUcking  of  buttons. 
The  graphics  picture  of  nets  as  line  segments  are  also  much  easier  to  read  and  less  erroneous. 
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6.1  Circuit  Representation 

At  schematic  level,  a  module  in  a  circuit  is  represented  as  a  rectangle  on  the  screen.  Its 
size  and  shape  could  be  considered  to  be  estimations  of  the  size  and  shape  of  the  final 
Magic  layout.  It  could  be  meaningless  too,  depending  whether  or  not  the  designer  chooses 
to  use  this  information  later.  Input  and  output  ports  of  a  module  are  represented  as  a 
small  square  on  the  boundary  of  the  module.  Again,  its  position  could  be  an  estimation 
of  its  final  location  in  the  module's  final  Magic  layout,  or  it  could  be  meaningless.  Nets, 
defined  as  a  list  of  ports,  are  represented  by  a  list  of  line  segments  connecting  those  squares 
representing  the  ports.  Ascii  names  could  be  assigned  to  both  modules  and  ports,  but  not 
nets.  Fig.  6.1  gives  an  example  of  a  16-bit  CPU  designed  in  schematic  layout  format. 

Modules  are  organized  in  a  hierarchical  way  -  modules  could  contain  submodules.  Basic 
modules  -  modules  that  do  not  contain  any  submodules  -  usually  contains  a  piece  of  C 
code  describing  its  logical  behavior.  Only  one  copy  of  a  module  wiU  be  kept,  even  if  it  is 
used,  as  submodules,  more  than  once  in  the  hierarchy. 

Modules  described  in  schematic  layout  are  considered  to  be  at  lower  level  than  those 
described  in  C,  since  they  contain  more  information  than  C  modules  do,  such  as  geomet- 
rical dimensions  and  shapes.  Meamwhile,  they  are  considered  to  be  at  higher  level  than 
those  described  in  Magic  layout,  since  they  obviously  contains  less  information  than  Magic 
modules  do.  For  instance,  they  do  not  have  any  transistors,  like  Magic  modules  do. 

6.2  The  Schematic  Layout  Editor 

A  menu-driven  tool,  called  schematic  layout  editor,  is  provided  to  allow  users  to  create 
and  manipulate  modules,  ports,  and  nets  in  almost  anyway  they  wants.  Using  the  editor, 
modules  could  be  created,  moved,  resized,  rotated,  or  flipped  in  very  much  the  same  way 
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Figure  6.1:  Schematic  Layout  of  a  16-bit  CPU 
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as  in  most  graphics  editors.  Ports  could  also  be  created  or  moved.  Nets  are  created  by 
clicking  buttons  at  those  ports  that  should  be  connected.  A  different  button  signals  end  of 
the  list,  and  an  automatic  router  will  be  called  to  draw  line  segments  connecting  the  ports. 

As  mentioned  before,  the  schematic  layout  of  a  circuit  is  organized  in  a  hierarchical 
way,  with  modules  containing  submodules.  Designers  could  walk  up  and  down  through 
the  hierarchy  tree,  and  modify  the  layout.  Since  only  one  copy  is  kept  for  each  module,  a 
modification  will  affect  all  occurrences  of  the  module,  should  it  be  used  many  times  in  the 
hierarchy.  At  leaves  of  the  hierarchy  tree,  all  modules  are  basic  modules.  A  text  editor, 
such  ats  "vi",  could  be  invoked  directly  from  the  schematic  layout  editor  to  compose  a  piece 
of  CHDL  program  for  the  module. 

The  information  contained  in  a  schematic  layout  could  be  extracted,  and  then  simulat- 
ed, together  with  behavioral  specifications  of  basic  modules,  using  Msim.  This  not  only 
eliminates  the  needs  to  manually  specify  interconnections  between  submodules  using  the 
"map"  statements,  but  also  more  efficient  in  terms  of  simulation  speed.  The  placement 
generator  previously  described  could  also  be  invoked  to  optimize  the  schematic  layout  in 
terms  of  area  estimation  or  wire  length,  or  both. 

6.3      Routing  Algorithm 

As  part  of  the  schematic  layout  editor,  we  have  implemented  an  automatic  router.  It  allows 
user  to  draw  nets  by  pointing  out  list  of  ports  only,  and  the  router  will  coonect  the  ports 
for  the  user.  Since  the  main  purpose  is  to  allow  designers  to  specify  interconnections  in  an 
efficient  manner,  the  automatic  router  does  only  global  routing.  Wires  are  running  through 
the  middle  of  routing  channels,  and  no  channel  routing  is  done.  To  have  a  clear  picture  of 
which  ports  are  connected  to  which,  nets  can  be  highlighted,  one  at  a  time.  The  method 
used  by  the  automatic  router  is  described  in  the  following  algorithm. 
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Algorithm  6.1    Global  Routing  Algorithm. 

INPUT:  A  list  of  rectangles  as  modules;  and  a  list  of  ports,  as  nets,  on  the  boundaries  of 
rectangles. 

OUTPUT:  A  list  of  line  segments  connecting  the  ports  in  the  given  list. 

method: 

/.  Divide  routing  space  into  "channels"  by  extending  boundaries  of  modules.  Each  chan- 
nel is  of  rectanglar  shape. 

2.  Combine  small  channels  into  large  ones  as  much  as  possible,  as  long  as  they  are  of 
rectanglar  shape. 

3.  Create  an  undirected  graph  G  with  vertices  representing  channels,  and  with  edges  rep- 
resenting adjacency  of  channels.  Two  vertices  will  be  connected  iff  the  two  channels 
represented  by  the  two  vertices  are  adjacent  in  the  layout. 

4-  Do  a  breadth-first  search  on  G,  and  find  the  minimum  spanning  tree  containing  chan- 
nels in  which  one  or  more  ports  on  the  given  list  is  located. 

5.  Connect  ports  on  the  list  by  going  through  those  channels  on  the  minimum  spanning 
tree.  Wires  go  from  one  channel  to  another  through  the  middle  point  of  their  common 
boarder. 
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Chapter  7 


A  Design  Example 


For  the  purpose  of  demonstration,  we  have  designed  a  microprocessor  under  our  CAD 
environment.  The  configuration  and  microinstruction  format  are  from  [Man  82].  It  is  a  16- 
bit  microprocessor  controlled  by  a  microprogram  stored  in  a  control  memory.  For  simplicity, 
there  will  be  no  operating  system.  As  soon  as  the  system  is  booted,  it  starts  loading  a  batch 
job  from  a  card  reader,  and  then  performs  the  job.  There  is  be  a  line  printer  available  for 
user  jobs  to  print  out  their  results.  The  card  reader  is  also  available  to  user  jobs. 

Its  design  started  with  a  schematic  layout,  created  using  our  schematic  layout  editor. 
The  top  level  contains  three  modules,  CARDREADER,  PRINTER,  and  CPU.  At  the  sec- 
ond level,  CPU  is  decomposed  into  seventeen  sub'modules,  all  of  them  are  currently  basic 
modules.  Fig.  7.1  shows  the  final  layout  produced  by  the  editor. 

Functional  behaviors  of  basic  modules  were  then  specified  using  the  CHDL.  No  behav- 
ioral specificatioa  is  wiitten  for  CPU,  since  we  have  already  had  a  very  dear  decomposition 
of  it.  As  an  example,  we  have  given  below  a  complete  listing  of  the  behavioral  specification 
for  the  accumulator  AC. 

cell   AC; 
out  bit  acCl6]  ; 
in  bit  b[l6] ; 
in  bit  a[16]  ; 
in  bit   sell; 
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Figure  7.1:  Schematic  Layout  of  a  16-bit  Microprocessor 
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in  bit  sel4; 
in  bit  8el3; 
in  bit  Bel2; 
in  bit  phil; 
in  bit  phi2; 
in  bit  Vdd; 
in  bit  GND; 
simulate  C; 

AC(b.  a.  sell.  sel4,  selS,  8el2,  phil.  phi2,  Vdd,  GND) 
in  bit  b[16]  ; 
in  bit  a[16] ; 
in  bit  sell ; 
in  bit  sel4; 
in  bit  8el3; 
in  bit  8el2; 
in  bit  phil; 
in  bit  phi2; 
in  bit  Vdd; 
in  bit  GND; 
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bit  Bsim.tmp; 

state  out  bit   acCl6]; 

state  bit  1[16] ; 

int  aa,   bb,    cc; 

unsigned  int  tl,  t2,  byteToIntO; 

unsigned  int  sel; 

int  i; 

il  (phil  ==  1) 
i 

tl  =  byteToInt(a[0] .   a[l] .   a[2] .    a[3] ,    a[4] .   a[S] ,    a[6] ,    a[7]); 

t2  =  byteToInt(a[8] .   a[9] .    a[10] ,    a[ll] .    a[l2] .    a[13] ,   a[14] .    a[15]); 

aa  =   (tl  «  8)    I    t2; 

tl  =  byteToInt(b[0],  b[l].   b[2]  .   b[3},   b[4]  .   b[5]  ,   b[6]  ,   b[7]); 

t2  =  byteToInt(b[8].  b[9].   b[lO]  ,   b[ltl.  b[12].   b[13]  .   b[l4]  ,   b[15]); 

bb  =  (tl  «  8)    I    t2; 

sel  =  byteToInt(0,  0,  0,  0,  sell,  sel2,  8el3,  sel4) ; 

BBitch  (sel) 
i 

/*  Arirhaetic  Uicro-OpmruziottM  */ 
case  0:  /♦  ac  <-  a  ■♦•  b   (0000)    •/ 

cc  =  aa  +  bb; 
breeik; 

case   1:  /•  ac  <-  a  -  b   (0001)   ♦/ 

cc  =  aa  -  bb; 
break; 

case  2:  /•   ac  <-  a  +   1    (0010)   •/ 

cc  =  ++aa; 
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break; 

case  3:      /•  ac  <-  a  -  1  (0011)  */ 
cc  =  — aa; 
breeJc; 

case  4:     /*  ac  <-  complement  of  a  (0100)  */ 
cc  =  "aa; 
brecik; 

case  5:     /*  ac  <-  complement  ol  a  +  1  (0101)  */ 
cc  =  "aa  +  1; 
break; 

case  6:     /*  ac  <-  a  +  complement  of  b  (0110)  */ 
cc  =  aa  +  "bb; 
brecLk; 

case  7:     /*  ac  <-  a  +  complement  of  b  +  1  (0111)  */ 
cc  =  aa  +  "bb  +  1; 
break; 

/*  Logic  Micro-Operations  */ 
case  8:     /*  ac  <-  0  (1000)  */ 

cc  =  0; 

break; 

case  9:     /*  ac  <-  a  XOR  b  (1001)  •/ 
cc  =  aa  "  bb; 
break; 

case  10:    /*  ac  <-  a  AND  b  (1010)  */ 
cc  =  aa  t  bb; 
breeik; 

case  11:    /*  ac  <-  a  OR  b  (1011)  ♦/ 
cc  =  aa  I  bb; 
bre2Lk; 

case  12:    /*  ac  <-  b  (1100)  */ 
cc  =  bb; 
break; 

case   13:  /•  HOP   (1101)   •/ 

cc  =  byteToInt(f[0].   f[l].   f[2].   f[3].   f[4],   f[53.   f[6],   f[7]); 
cc  =    (cc  «  8)    I 

byteToInt(f[8].  f[9].   fClO],   f[ll].   f[12].   f[l3]. 
fCl4].   f[15]); 
break; 

case  14:  /•  ac  <-  shift  left  of  a  (1110)   */ 

cc   =   aa  «    1; 
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break; 

case  15:     /*  ac  <-  shift  right  of  a  (1111)  •/ 
cc  =  aa  >>  1; 
break; 

default: 
brecLk; 
}   /*    switch  */ 

for    (i   =   0;    i   <    16;    i++) 

f[lB-i]    =    (cc  »   i)   t  01; 
> 

else   if    (phi2   ==    1) 
•C 

for   (i  =  0;    i  <   16;    i++) 
ac[i]    =  f  Ci]; 
> 


The  schematic  layout  and  behavioral  specifications  were  then  simulated  using  the  multi- 
level simulator.  After  the  successful  simulation,  we  started  refining  modules  by  creating 
geometry  layout  for  them,  using  the  Magic  layout  editor.  The  whole  system  was  simu- 
lated after  every  refinement  to  ensure  its  correctness.  Also,  the  automatic  placement  tool 
described  in  Chapter  5  was  used  to  optimize  the  layout,  and  the  result  is  shown  in  Fig.  5.3. 

7.1      Microprocessor  Configuration 

As  shown  in  Fig.  7.2,  the  system  consists  of  the  following  key  components: 

•  Accumulator  Register  AC  —  a  16-bit  register  capable  of  performing  16  operations  on 
its  two  16  bit  input.  Its  result  is  also  a  16-bit  number. 

•  Memory  Module  MM  —  a  2048  x  16-bit  RAM  with  an  address  bus  of  11  bits  and  a 
data  bus  of  16  bits. 

•  Memory  Address  Register  MAR  —  a  11-bit  register  providing  addresses  for  MM. 
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Figure  7.2:  Block  diagram  of  the  microprocessor 

•  Memory  Buffer  Register  MBR  —  a  16-bit  register  providing  and/or  receiving  data 
for/from  the  MM. 

•  Program  Counter  PC  —  a  11-bit  register  which  serves  as  the  program  counter. 

•  Control  Memory  CM  —  a  128  x  20-bit  ROM  in  which  stores  the  microinstructions 
controlling  the  microprocessor.  As  MM,  its  has  its  own  address  register  register. 

•  Control  Memory  Address  Register  CAR  —  a  7-bit  register  providing  addresses  for 
CM. 

•  Subroutine  return  register  SBR  —  a  7-bit  register  which  stores  the  return  address 
when  a  subroutine  is  called  in  the  microprogram. 

•  Decoder  —  a  module  that  interprets  the  control  bits  of  the  encoded  microinstructions, 
and  controls  the  behaviors  of  other  modules. 

•  Card  Reader  CARDREADER  —  which  simulates  a  card  reader. 

•  Printer  PRINTER  —  which  simulates  a  line  printer. 
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One  thing  that  is  not  shown  in  the  diagram  of  Fig.  7.2  is  the  control  signals  going  from 
Decoder  to  all  the  other  components  of  the  system. 

The  system  employs  a  two  pha^e  clocking.  During  clock  phase  phiU  all  the  inputs  are 
valid;  during  clock  phase  phi2,  outputs  of  all  modules  are  changed,  and  no  input  should  be 
considered  valid.  All  registers  are  master-slave  registers. 

7.2      Microinstruction  Format 

Each  microinstruction  residing  in  the  control  memory  consists  of  20  bits  divided  into  four 
functional  fields,  as  shown  in  Fig.  7.3.  The  numbers  on  top  of  each  field  are  the  number  of 
bits  contained  in  that  field. 


Fl 

F2 

F3 

CD 

BR 

ADF 

Micro-ops 


Figure  7.3:  Microinstruction  format  (20  bits) 

The  micro-ops  field  specifies  micro-operations  for  the  microprocessor.  The  CD  field 
selects  status  bit  conditions,  the  BR  field  specifies  the  type  of  branch,  and  the  ^DF  field 
contains  an  address. 

The  micro-ops  field  is  subdivided  into  three  subfields  Fl,  F2,  and  F3.  of  three  bits  each. 
The  three  bits  in  each  field  are  encoded  to  specify  seven  distinct  micro-operations  as  listed 
in  Table  7.1.  This  means  a  maximum  of  three  micro-operations  could  be  performed  during 
each  clock  cycle. 


The  CD  (condition)  field  consists  of  two  bits  which  are  encoded  to  specify  four  status 
bit  conditions  as  listed  in  Table  7.2.  The  BR  (branch)  field  consists  of  two  bits,  as  shown 
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Fl      Micro-operation 


Symbol 


F2      Micro-operation 


Symbol 


000 

None 

NOP 

001 

AC  ^  AC  +  MBR 

ADD 

010 

AC  ^0 

CLRAC 

on 

AC  ^  AC  +  \ 

INCAC 

100 

AC  ^  MBR 

BRTAC 

101 

MAR  ^  MBR{AD) 

BRTAR 

110 

MAR  ^  PC 

PCTAR 

111 

M  ^  MBR 

WRITE 

000  None  NOP 

001  AC  ^  AC  -  MBR  SUB 
010  AC  *-  ACW  MBR  OR 
Oil  AC*- AC  A  MBR  AND 

100  MBR  ^  M  READ 

101  MBR  ^  AC  ACTBR 

110  MBR^  MBR  +  l  INCBR 

111  MBR{AD)^  PC  PCTBR 


F3      Micro-operation 


Symbol 


000  None  NOP 

001  AC^A£@MBR  XOR 
010  AC  ^AC  COM 
Oil  shlAC  SHL 

100  shrAC  SHR 

101  PC  ^  PC  +  1  INCPC 

110  PC  ^  MBR{AD)  BRTPC 

111  Reserved  — 


Table  7.1:  Micro-operation  fields  bit  assignment 

in  Table  7.3.  It  is  used,  in  conjunction  with  the  address  field  ADF,  to  choose  the  address 
of  the  next  microinstruction. 

CD     Condition  Symbol     Comments 

Unconditional  (always  =  1) 
Indirect  address  bit 
Sign  bit  of  AC 
Zero  value  in  AC 


00 

1 

U 

01 

MBR{I) 

I 

10 

AC{S) 

s 

11 

AC  =  Q 

z 

Table  7.2:  CD  (condition)  field  bit  assignment 

7.3      Assembly  Language 

Our  machine  language  consists  of  16  bits  per  instruction.  The  assembly  instruction  format 
is  show  in  Fig.  7.4.  The  numbers  on  top  of  each  field  are  the  number  of  bits  contained  in 
that  field. 
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BR     Symbol     Function 


00  JMP  CAR  ^  ADF  if  condition  =  1 

CAR  <-  CAR  +  1  if  condition  =  0 

01  CALL       CAR  ^  ADF,  SBR  ^  CAR  +  1  if  condition  =  1 

CAR  ^  CAR  +  1  if  condition  =  0 

10  RET  CAR  ^  SBR  (Return  from  subroutine) 

11  MAP         CAR{2-b)*- MBR{OP),CAR{\)^Q, 

C AR{&  —  7)  <—  1  (Micro-operation  mapping) 

Table  7.3:  BR  (branch)  field  bit  assignment 


11 


I 

OP 

ADDR 

Figure  7.4:  Assembly  instruction  format  (16  bits) 

Since  we  have  a  2048  X  16  bits  of  memory,  we  need  11  bits  for  memory  reference  in  our 
instructions.  The  last  11  bits  are  used  for  this  purpose.  Putting  aside  one  bit  (the  first  bit) 
«is  indirect  addressing  bit,  we  have  four  bits  left  for  operation  specifications.  This  indicates 
the  maximum  of  16  assembly  instructions.  Since  some  of  the  instructions  do  not  need  the 
indirect  bit  (they  do  not  make  memory  references),  we  have  assigned  new  meanings  to  such 
instructions  with  indirect  bit  on.  This  allows  us  to  have  more  than  16  assembly  instructions. 
Table  7.4  shows  all  the  instructions,  their  binary  codes,  and  a  briefly  description  for  each 
of  them.  In  the  table,  m  represents  the  address  field  of  the  instruction.  If  the  /  field  of  an 
instruction  in  the  table  is  0  (or  1),  it  means  this  field  must  be  0  (or  1)  for  this  instruction; 
if  tbefirid  is  blank,  either  1  or  0  could  be  put  there,  and  it  is  significant. 


In  the  following  section,  we  will  have  the  microprogramming  implementation  of  these 
instructions.  As  an  example,  here  is  the  assembly-language  program  that  loads  user  jobs 
from  card  reader  every  time  the  system  is  booted.  It  is  stored  from  location  0  through  15 
of  the  memory  module.    The  ORG  (origin)  instruction  in  the  list  is  a  pseudo-instruction 
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Symbol      I       OP       Description 


ADD  0000  Add  M  to  AC 

SUB  0001  Subtract  M  from  AC 

AND  0010  And  (bitwise)  M  to  AC 

OR  0011  Or  (bitwise)  M  to  AC 

XOR  0100  Exclusive-Or  M  to  AC 

LOAD  0101  Load  M  into  AC 

STORE  0110  Store  AC  into  M 

CIR  0     0111  Circulate  right  AC,  fill  leftmost  with  0 

RET  1     0111  Branch  indirectly  to  m 

CIL  0     1000  Circulate  left  AC,  fill  rightmost  with  0 

INC  0     1001  Increment  AC 

HALT  1     1001  Halt  computer 

JZA  1010  Jump  to  m  if  AC  is  Zero 

JNA  1011  Jump  to  m  if  most  significant  bit  of  AC  =  1 

JMP  1100  Branch  unconditionally 

CALL  0      1101  Save  return  address  in  m  and  jump  to  m  +  1 

IN  0     1110  Read  from  card-reader  into  AC 

CLA  1     1110  Clear  AC 

CM  A  0     1111  Complement  AC 

OUT  1      nil  Write  AC  to  line  printer 

Table  7.4:  Assembly  Instructions 

that  indicates  the  memory  location  in  which  the  following  instruction  or  operand  wiU  be 
put.  We  use  this  same  pseudo-instruction  in  our  microprograms  as  well.  Also  note  that 
everything  after  and  includes  two  consecutive  backslashes  are  comments. 


LOOP: 


ORG  0 

II 

STORE 

I  IHDRCT 

IHC 

JZA 

USER 

LOAD 

IHDRCT 

nc 

sruHL, 

IIDRCr 

JMP 

LOOP 

ORG  15 

IHDRCT:  0000  0000  0001  0000 


ORG  16 


//  read  from  ceurd  reader 

//  store 

//  increment  the  vord  just  read 

//  if  all  1 ' 8  Has  read  in ,  done 

//  update  the  indirct  address 


//  go  to  read  next  instruction 


//  starting  address  shere  user  job  is  loaded 


USER: 


//  user  job  stsurts  here 


65 


7.4      Contents  of  Control  Memory 

When  the  microprocessor  is  booted,  both  the  program  counter  PC  and  the  control  memory 
address  register  CAR  will  be  set  to  zero.  Then  the  next  thing  the  microprocessor  is  going  to 
do  will  be  the  execution  of  the  microinstruction  residing  at  location  0  of  the  control  memory. 
So  we  have  arranged  the  control  memory  such  that  the  microinstructions  performing  the 
fetch  cycle  is  located  at  address  0.  Them  are  followed  by  a  list  of  microinstructions  groups, 
four  in  each  group.  Usually,  each  such  group  implements  one  assembly  instruction.  Some 
groups  implement  two  assembly  instructions,  and  they  are  distincted  by  the  different  values 
in  the  indirect  field.  The  following  is  the  contents  of  our  control  memory,  which  implements 
our  assembly  instructions. 


ORG  0 

FETCH : 

PCTAR 

U 

JMP 

HEXT 

READ,  INCPC 

U 

JMP 

HEXT 

BRTAR 

U 

MAP 

ORG  3 

ADD: 

HOP 

I 

CALL 

IKDRCT 

READ 

U 

JMP 

HEXT 

ADD 

U 

JMP 

FETCH 

ORG  7 

SUB: 

HOP 

I 

CALL 

IHDRCT 

READ 

U 

JMP 

HEXT 

SUB 

U 

JMP 

FETCH 

ORG  11 

AHD: 

HOP 

I 

CALL 

IHDRCT 

READ 

U 

JMP 

HEXT 

AHD 

u 

JMP 

FETCH 

ORG  15 

OR: 

HOP 

I 

CALL 

IHDRCT 

READ 

u 

JMP 

HEXT 

OR 

u 

JMP 

FETCH 

ORG  19 

XOR: 

HOP 

I 

CALL 

IHDRCT 

READ 

u 

JMP 

HEXT 

XOR 

u 

JMP 

FETCH 
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ORG  23 

LOAD: 

HOP 

I 

CALL 

INDRCT 

READ 

U 

JMP 

HEXT 

BRTAC 

u 

JMP 

FETCH 

ORG  27 

STORE: 

HOP 

I 

CALL 

INDRCT 

ACTBR 

u 

JMP 

HEXT 

WRITE 

u 

JMP 

FETCH 

ORG  31 

CIR: 

HOP 

I 

JMP 

KEXT+1 

SHR 

u 

JMP 

FETCH 

RET: 

HOP 

u 

CALL 

INDRCT 

BRTPC 

u 

JMP 

FETCH 

ORG  35 

CIL: 

SHL 
ORG  39 

u 

JMP 

FETCH 

IHC: 

HOP 

I 

JMP 

HEXT+1 

IHCAC 

u 

JMP 

FETCH 

HALT: 

HALT 
ORG  43 

JZA: 

HOP 

z 

JMP 

HEXT+1 

HOP 

u 

JMP 

FETCH 

HOP 

I 

CALL 

IHDRCT 

BRTPC 

u 

JMP 

FETCH 

ORG  47 

JNA: 

HOP 

s 

JMP 

HEXT+1 

HOP 

u 

JMP 

FETCH 

HOP 

I 

CALL 

INDRCT 

BRTPC 

u 

JMP 

FETCH 

ORG  61 

JMP: 

HOP 

I 

CALL 

INDRCT 

BRTPC 

u 

JMP 

FETCH 

ORG  55 

CALL: 

PCTBR.  BRTAC 

u 

JMP 

HEIT 

WRITE.  ACTBR 

u 

JMP 

HEXT 

IHCBR 

u 

JMP 

HEXT 

BRTPC 

u 

JMP 

FETCH 

ORG  69 

CLA: 

CLRAC 

I 

JMP 

FETCH 

IH: 

IHPUTl 

u 

JMP 

NEXT 

IHPUT2 

u 

JMP 

NEXT 

BRTAC 

u 

JMP 

FETCH 

67 


ORG  63 

CMA: 

HOP 

I  JMP 

KEXT+1 

COM 

U  JMP 

FETCH 

OUT: 

OUTPUT 
ORG  123 

U  JMP 

FETCH 

IHDRCT: 

READ 

U  JMP 

HEXT 

BRTAR 

U  RET 

Note  that  the  space  between  66  and  122  are  empty.  In  fact,  by  careful  arrangement,  we 
can  pack  these  microinstructions  into  a  64  X  20-bit  ROM. 

7.5      Simulation  Results 

As  we  have  mentioned  previously,  our  microprocessor  will  start  loading  a  program  from  the 
CARD  READER  zis  soon  as  it  is  booted.  Upon  finishing  the  loading,  it  jumps  to  the  start  of 
the  program  just  loaded,  and  begins  the  execution.  We  have  tested  several  programs.  The 
program  listed  below  does  multiplication  of  two  unsigned  8-bit  binary  numbers.  The  result 
will  be  a  16-bit  binary  number.  The  subroutine  ADD  could  be  expanded  into  the  main 
program,  since  it  is  called  from  only  one  place.  We  choose  to  implement  it  as  a  subroutine 
to  demonstrate  the  way  subroutine  calls  works.  The  subroutine  call  is  implemented  by 
storing  return  address  at  the  first  location  of  the  subroutine,  and  then  jump  to  the  second 
location  of  the  subroutine.  The  return  instmction  is  implemented  by  jumping  indirectly  to 
the  beginning  of  the  subroutine,  where  the  return  address  is  stored  by  the  caller. 


ORG  16 

LOOP: 

LOAD 

11 

AND 

MASK 

JZA 

L 

CALL 

ADD 

L: 

LOAD 
CIR 

HI 

STORE 

HI 

LOAD 

H2 

CIL 

STORE 

R2 

LOAD 

CTR 
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INC 

STORE 

CTR 

SUB 

N8 

JZA 

DONE 

JHP 

LOOP 

DONE: 

LOAD 

P 

OUT 

HALT 

ADD: 

0 

// 

return  address 

LOAD 

P 

ADD 

N2 

STORE 

:  p 

RET 

//  data 

MASK: 

0000 

0000 

0000 

0001 

11: 

0000 

0000 

1000 

1010 

12: 

0000 

0000 

0110 

1010 

P: 

0000 

0000 

0000 

0000 

H8: 

0000 

0000 

0000 

1000 

CTR: 

0000 

0000 

0000 

0000 

END: 

1111 

1111 

1111 

1111 

//  mask  to  get  last  bit 

//  multiplicand 

//  multiplier 

//  product 

//  number  8 

//  counter  to  remember  #bits  done 

//  program  end  mcirker 


To  demonstrate  the  performance  of  our  system,  we  have  simulated  the  microprocessor 
using  several  test  programs,  including  the  above  multiplication  program.  The  test  was 
conducted  on  a  Sun  workstation.  Different  combinations  of  description  levels  for  modules 
and  submodules  were  tried  out,  and  their  results  are  shown  Table  7.5. 

test  program       easel     case2     case 3 . 


multiplication 

181.3 

249.3 

890.8 

division 

203.1 

296.7 

931.0 

fibonacci 

219.7 

301.5 

971.2 

Table  7.5:  Performance  comparasion 


In  Table  7.5,  the  three  cases  means: 


easel:  All  modules  are  simulated  at  functional-level; 

case2:  Only  REG 7  is  simulated  at  switch-level. 

caseS:  The  following  modules  are  simulated  at  switch-level:  AOO,  REG7,  REGll,  REG16, 
MUX7,  MUXll,  MUX16.  The  rest  are  simulated  at  functional-level; 
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The  microprocessor  starts  by  running  a  loader  which  loads  in  a  test  program.  It  then 
transfers  the  control  to  the  test  program  just  loaded  in.  One  thing  that  is  clearly  shown  in 
Table  7.5  is  that  functional-level  simulation  is  much  faster  than  switch-level  simulation.  It 
tells  us  that  we  should  take  this  advantage  by  simulating  at  switch-level  only  those  modules 
that  are  newly  designed.  This  is,  as  soon  as  we  have  fully  tested  a  module,  we  should  change 
to  simulate  it  at  functional-level  while  testing  other  modules. 


Chapter  8 

Conclusions  And  Future  Works 


8.1  Conclusions 

We  have  designed  and  implemented  an  integrated  VLSI  CAD  environment  that  supports 
almost  all  phases  of  the  VLSI  circuit  design  cycle,  from  high-level  circuit  description  down 
to  mask  generation.  Tools  that  have  been  integrated  under  the  environment  include  a 
multi-level  simulator,  an  automatic  placement  tool,  a  schematic  layout  editor,  and  the 
geometrical  layout  editor  Magic  developed  at  UC  Berkeley. 

Experience  shows  that  the  design  productivity  can  be  greatly  increased  under  the  envi- 
ronment. As  an  experiment,  I  implemented  a  basic  16-bit  microprocessor  using  the  system. 
It  took  me  less  than  a  week  to  produce  the  working  version  of  the  schematic  layout  and  the 
functional  descriptions  of  the  basic  modules.  This  would  be  impossible  if  we  had  to  layout 
everything  in  Magic.  Also,  it  would  be  much  more  difficult  if  we  had  to  specify  the  267 
nets  via  "map"  statements. 

8.2  Future  Works 

Our  system  has  demonstrated  many  of  its  advantages.  But  it  is  far  from  complete.  Many 
nice  features  and  tools  are  still  waiting  to  be  added  to  the  system.  The  following  is  a  partial 
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list. 

8.2.1      Extensions 

The  schematic  layout  editor  has  made  the  specification  of  interconnections  between  modules 
very  easy.  Nets  are  created  by  chcking  buttons  at  those  ports  that  should  be  connected. 
There  are  times  when  we  want  to  connect  array  of  ports,  say  a  [32] ,  with  another  array  of 
ports,  say  f  [32]  ,  in  such  a  regular  way  that  a[i]  will  be  connected  with  f  [i]  where  i  goes 
from  0  to  31.  In  this  case,  it  would  be  much  easier  if  we  could  issue  a  command:  connect 
a[32]  f  [32],  and  the  editor  will  automatically  connect  a[0]  with  f  [0],  a[l]  with  f  [l], 
...,  and  a[31]  with  f  [31]. 

Another  useful  command  for  the  schematic  layout  editor  would  be  to  expand  a  subcell 
into  an  array  (one  dimensional  or  two  dimensional)  of  subcells.  And  the  cormect  command 
mentioned  above  would  have  another  dimension  of  power. 

Current  implementation  of  the  schematic  layout  editor  includes  a  built-in  automatic 
router  which  does  global  routing.  Designers  specify  nets  by  clicking  mouse  buttons  at 
those  ports  being  connected,  and  the  automatic  router  will  do  the  routing  and  draw  the 
virtual  wires.  Although  the  main  purpose  of  tlje  router  is  to  allow  designers  to  specify 
interconnections  easily,  it  could  also  serve  as  a  guideline  for  physical  routings  at  later  stage 
of  the  design.  Toward  this  end,  the  automatic  router  should  consider  information  such  «is 
the  density  of  routing  channels.  It  should  also  provide  graphics  interface  allowing  designers 
to  edit  certain  critical  paths  mannaHy. 

By  collecting  some  extra  information  while  simulating  circuit,  the  multi-level  simulator 
could  provide  facilities  to  conduct  architectural-level  simulations.  For  example,  the  simu- 
lator could  keep  track  of  the  number  of  times  a  module  was  activated,  which  could  then 
be  used  by  designers  to  predict  performance  of  the  system,  or  to  determine  architectural 
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parameters  of  the  system.  Furthermore,  the  simulator  could  keep  track  of  the  number  of 
times  a  node  was  changed,  which  could  be  used  by  designers  to  do  performance  analysis  at 
circuit-level. 

In  our  current  implementation,  all  nodes  at  functional  level  are  considered  input  nodes. 
Signals  coming  in  through  input  ports  will  have  the  strength  u.  Also,  signals  going  out 
through  output  ports  will  have  the  strength  u.  While  this  assumption  simplifies  the  CHDL 
language,  it  also  loses  accuracy  at  some  degree.  Once  we  have  refined  the  cell  into  Magic 
layout,  say,  the  assumptions  may  become  incorrect.  Some  errors  that  would  not  otherwise 
exist  may  occur.  To  deal  with  this  problem,  we  can  easily  introduce  notions  to  specify 
signal  strengths  at  functional  level,  and  let  our  simulator  Msim  take  them  into  account. 

The  switch  level  simulation  could  be  speed  up  if  nodes  are  stored  in  memory  by  static 
connected  components  of  the  ternary  switch  graph,  hence  minimizing  page  fault. 

It  would  be  more  accurate  if  the  automatic  placement  tool  could  consider  ports  on 
boundaries  of  modules,  instead  of  at  centers  of  modules.  Automatic  routing  tools  could  be 
added  to  perform  global  routing  and  channel  routing.  Also,  the  automatic  placement  tool 
should  be  able  to  operate  directly  on  the  schematic  layout. 

8.2.2      Debugging  Toolkit 

For  the  purpose  of  debugging,  the  multi-level  simulator  should  be  able  to  report  the  causes 
of  changes  of  a  net  when  asked.  Even  better,  imagine  all  reports  from  the  simulator  were 
graphical,  such  as  on  the  schematic  layout,  or  on  the  Magic  layout.  The  problem  would  be 
pinned  out  in  seconds. 

Designers  should  be  able  to  place  break  points  anywhere  in  the  circuit,  at  any  level. 
Designers  should  also  be  able  to  construct  a  boolean  expression  in  a  general  form,  for  a 
break  point,  and  ask  the  simulator  to  stop  the  simulation  conditionally.  For  example,  the 
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simulator  could  be  asked  to  stop  if  a  specific  node  becomes  1;  or  if  one  of  the  inputs  of  a 
specific  module  is  changed. 

8.2.3     New  Tools 

New  hardware  description  languages  at  functional  level,  such  as  VHDL,  could  be  added  to 
the  system.  Other  circuit  representations  at  intermediate  levels,  such  as  gate  level,  would 
certainly  make  refinement  conversions  smoother. 

The  multi-level  simulator  operates  only  in  unit-delay  mode.  Timing  information  is 
completely  ignored.  Although  we  have  tools  such  as  Crystal  for  timing  analysis  of  Magic 
layout,  it  would  be  hard  to  do  any  such  analysis  on  circuit  with  components  described 
at  higher  levels.  For  this  purpose,  we  need  a  mechanism  in  our  HDL  to  specify  timing 
information  for  those  modules  whose  Magic  layout  are  not  available  yet.  It  would  be  better 
if  our  multi-level  simulator  could  also  use  timing  information  to  operate  in  linear  delay 
mode. 

The  technique  of  compiling  circuit  layout  into  low  level  machine  code  in  order  for  them 
to  be  simulated,  as  did  in  [BBB  87]  and  [WHP  87]  may  be  explored  for  the  switch-level 
simulation  of  Msim.  For  each  module,  several  relocatable  object  codes  are  produced,  one 
from  circuit  layout  and  oae  from  functional  specification,  say.  They  are  then  selectively 
linked  together  with  other  modules,  depending  on  which  level  we  want  each  module  to  be 
simulated  at. 

The  power  of  onr  system  would  be  greatly  enhanced  if  it  could  provide  interface  with  oth- 
er existing  systems.  For  instance,  circuit  representations  at  functional-level  and  schematic- 
level  may  be  transformed  into  other  formats  so  that  a  silicon  compiler  such  as  Flamel 
[Tri  85]  could  be  used  to  produce  VLSI  chips  directly.  Another  possibility  would  be  to 
transform  current  circuit  representations  into  that  acceptable  by  other  simulation  engines 
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such  as  the  Yorktown  Simulation  Engine  [Pfi  82],  [Den  82],  [KrP  82]. 
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Appendix  A 

Summary  of  Msim  Commands 


Each  Msim  command  hcis  the  following  simple  syntax: 

cmd  arg^  arg2   . . .  arg„  (Return) 

where  "cmd"  specifies  the  operation  to  be  performed  and  the  arg,  are  arguments  needed 
for  that  operation.  The  arguments  are  separated  by  spaces  (or  tabs)  and  the  command  is 
terminated  by  a  (Return).  The  command  name  can  be  abbreviated,  just  as  long  BlS  you 
type  enough  characters  to  distinguish  it  from  all  other  commands. 

Many  of  the  Msim  commands  take  node  names  as  parameters.  The  nome  of  a  node  is 
the  label  attached  to  it  in  the  Magic  layout,  or  in  the  schematic  layout,  prefixed  by  the 
complete  path  name  consisting  of  the  name  of  parent  cell,  grandparent  cell,  and  so  on, 
divided  by  backslashes.  If  two  nodes  with  different  labels  are  connected  together,  either 
name  can  be  used  to  refermcv  the  connected  node. 

The  following  is  the  list  of  commands  with  their  syntax  and  semantics. 

#   comment  ... 

Lines  beginning  with  #  are  treated  as  comments  and  are  ignored.    It  is  useful  for 
comments  or  temporarily  disabling  certain  commands  in  a  command  file. 

76 


7 

Print  a  command  summary.  Same  as  the  help  command. 

alias   node  name\  name^  ■ . . 

Define  nam€\,  name2  . . .,  to  be  the  nicknames  of  the  node  node,  so  that  they  can  be 
used  to  reference  the  same  node.  This  is  especially  useful  for  nodes  with  a  very  long 
name,  such  as  nodes  of  a  subceU  deep  down  the  hierarchy  tree  of  the  circuit. 

clock  node  phi^  phi2  ... 

Define  node  to  be  one  of  the  clocks  in  the  circuit,  with  phi-  as  its  value  during  the  z"* 
phase. 

cent   [:lf  steps] 

Continue  the  simulation  from  wherever  it  has  been  left  for  insteps  steps  if  provided. 
Otherwise  continue  until  stabilize. 

cycle  m^  cycles] 

Simulate  the  circuit  for  ifcycles  cycles. 

exit 

Exit  Msim  and  return  to  system.  Same  as  the  quit  command. 

help 

Print  a  command  summary.  Same  as  the  ?  command. 

high   nodei  node-i  . . . 

Set  a  list  of  nodes,  nodci  nodci  . . .,  to  logic  value  1,  with  strength  u. 

load  filci  fil&2  ■  • . 

Load  circuit  descriptions  from  files  file^  file^  

low  nodei  node^  . . . 

Set  a  list  of  nodes,  node^  node2   . . .,  to  logic  VcJue  0,  with  strength  ui. 
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merge  node\  nodej   .  ■ . 

Electronically  connect  the  nodes  node\  node2  

mode  [race  \  norace] 

Set  the  simulation  mode  to  either  detect-race-condition  or  to  do-not-detect-race- 
condition.  Without  argument,  the  mode  will  be  flipped. 

phigh  nodei  node2  ■ .  ■ 

Set  a  list  of  nodes,  nodei  node2  . . .,  to  logic  value  1,  with  strength  /C2f  which  is  the 
strongest  among  storage  nodes. 

plow   nodei  node^   ■  ■  ■ 

Set  a  list  of  nodes,  nodci  node2  . . .,  to  logic  value  0,  with  strength  K2,  which  is  the 
strongest  among  storage  nodes. 

print  [  ?  I  keyword  \  nodei  node2  ■ . .] 

K  a  list  of  node  names,  nodei  node2  . . .,  is  given  as  arguments,  information  about 
these  nodes,  such  as  their  logic  values,  will  be  printed  out.  If  ?  is  given  as  argument, 
the  list  of  valid  keywords  with  a  brief  description  of  their  meanings  will  be  printed 
out,  which  has  the  following  meanings. 

all  Print  out  information  of  all  nodes. 
cells  Print  out  information  of  all  cells. 
clocks  Print  out  information  of  all  defined  clocks. 
events  Print  out  contents  of  the  event  queue. 
inputs  Print  out  current  list  of  input  nodes. 
traced  Print  out  information  of  all  traced  nodes, 
transistors  Print  out  the  list  of  all  transistors. 
watched  Print  out  information  of  all  watched  nodes. 
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If  no  argument  is  given,  the  default  argument  is  watched. 

quit 

Exit  Msim  and  return  to  system.  Same  as  the  exit  command. 

reset 

Delete  all  the  information  about  the  current  circuit.  Ready  to  read  in  another  circuit 
descriptions. 

run   [#  steps] 

Simulate  the  circuit  jf  steps  steps.  If  no  argument  is  given,  the  simulation  wiU  continue 
until  the  circuit  stabilize. 

source  file 

Read  and  execute  commands  from  file  file. 

step  [if^  steps] 

Simulate  the  circuit  ^ steps  steps.  Default  is  1  step. 

trace   nodei  node2  ■ . . 

Trace  the  list  of  nodes  node\  node2   ...  for  changes.    Report  logic  value  changes  as 
soon  as  it  occurs. 

unclock  node\  node2  ■ . . 

Undefine  nodes  nodei  node2   ...  as  clocks. 

uatrace  nodei  node2  . . . 

Delete  nodes  nodei  node2  .  ■ .  from  the  list  of  nodes  been  traced. 

unwatch  nodei  node2  . . . 

Delete  nodes  nodei  node2  . . .  from  the  list  of  nodes  been  watched. 

watch  nodei  node2  . . . 

Watch  for  the  list  of  nodes  nodci  nodej  —  Print  out  information  about  these  nodes 
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everytime  the  simulation  stops. 

X  nodei  node2  . .  ■ 

Set  a  list  of  nodes,  node-[  node2   . . .,  to  logic  value  A',  with  strength  kj,  which  is  the 
weakest  among  storage  nodes. 

!   cmd 

Escape  to  operating  system  temporarily.  Excute  the  command  cmd  under  the  oper- 
ating system  environment. 
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Appendix  B 


Definitions  of  Binary  Operators 


The  following  are  definitions  of  the  binary  operators  used  by  our  automatic  placement 
system.  They  are  similar  to  the  ones  defined  in  [VVoL  87]. 
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Figure  B.l:  Orientation  of  left  module  =  0,  orientation  of  right  module  =  0 
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Figure  B.2:  Orientation  of  left  module  =  0,  orientation  of  right  module  =  1 
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Figure  B.3:  Orientation  of  left  module  =  0,  orientation  of  right  module  =  2 
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Figure  B.4:  Orientation  of  left  module  =  0,  orientation  of  right  module  =  3 
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Figure  B.5:  Orientation  of  left  module  =  0,  orientation  of  right  module  =  4 
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Figure  B.6:  Orientation  of  left  module  =  1,  orientation  of  right  module  =  0 
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Figure  B.7:  Orientation  of  left  module  =  1,  orientation  of  right  module  =  1 
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Figure  B.8:  Orientation  of  left  module  =  1,  orientation  of  right  module  =  2 
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Figure  B.9:  Orientation  of  left  module  =  1,  orientation  of  right  module  =  3 
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Figure  B.IO:  Orientation  of  left  module  =  1,  orientation  of  right  module  =  4 
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Figure  B.ll:  Orientation  of  left  module  =  2,  orientation  of  right  module  =  0 
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Figure  B.12:  Orientation  of  left  module  =  2,  orientation  of  right  module  =  1 
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Figure  B.13:  Orientation  of  left  module  =  2,  orientation  of  right  module  =  2 
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Figure  B.14:  Orientation  of  left  module  =  2,  orientation  of  right  module  =  3 
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Figure  B.15:  Orientation  of  left  module  =  2,  orientation  of  right  module  =  4 
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Figure  B.16:  Orientation  of  left  module  =  3,  orientation  of  right  module  =  0 
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Figure  B.17:  Orientation  of  left  module  =  3,  orientation  of  right  module  =  1 
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Figure  B.18:  Orientation  of  left  module  =  3,  orientation  of  right  module  =  2 
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Figure  B.19:  Orientation  of  left  module  =  3,  orientation  of  right  module  =  3 
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Figure  B.20:  Orientation  of  left  module  =  3,  orientation  of  right  module  =  4 
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Figure  B.21:  Orientation  of  left  module  =  4,  orientation  of  right  module  =  0 
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Figure  B.22:  Orientation  of  left  module  =  4,  orientation  of  right  module  =  1 
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Figure  B.23:  Orientation  of  left  module  =  4,  orientation  of  right  module  =  2 


104 


A 
A 


Xi 


X2 


B 

B 

A 

A 


+1 


+  2 


B 

B 

B 


Figure  B.24:  Orientation  of  left  module  =  4,  orientation  of  right  module  =  3 
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Figure  B.25:  Orientation  of  left  module  =  4,  orientation  of  right  module  =  4 
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